JeffersonLab / chroma

The Chroma Software System for Lattice QCD
http://jeffersonlab.github.io/chroma
Other
57 stars 50 forks source link

Chroma does not build in double precision and soalen=2 on AVX2 #41

Closed martin-ueding closed 2 years ago

martin-ueding commented 6 years ago

We want to do a lattice with Lx = 28. After checkerboarding this is 14 and must be divisible by the SoA length. Therefore we need soalen = 2 in a build. We want to have double precision and it has to run on KNL, where the vector length is 8 for double and 16 for float.

Taking a look into codegen/jinja/isa.js we see that AVX512 does not have the vector length that we need.

    "avx512": {
        "fptypes": {
            "double": {"veclen": 8, "soalens": [4, 8]},
            "float": {"veclen": 16, "soalens": [4, 8, 16]},
            "half": {"veclen": 16, "soalens": [4, 8, 16]}
        },
        "extra_includes_global": ["immintrin.h"],
        "extra_includes_local": []
    },

However, with AVX2 we have this option:

    "avx2": {
        "fptypes": {
            "double": {"veclen": 4, "soalens": [2, 4]},
            "float": {"veclen": 8, "soalens": [4, 8]},
            "half": {"veclen": 8, "soalens": [4, 8]}
        },
        "extra_includes_global": ["immintrin.h"],
        "extra_includes_local": ["qphix_codegen/avx_utils.h"]
    },

The problem is that Chroma seems to compile with both single and double precision kernels and there is no single precision kernel with SoA length 2:

/usr/local/software/jurecabooster/Stages/2018a/software/impi/2018.2.199-iccifort-2018.2.199-GCC-5.5.0/bin64/mpiicpc -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/sources/chroma/mainprogs/main -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/sources/chroma/lib -I../../lib  -xCORE-AVX2 -O3 -fopenmp -std=c++11 -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include/libxml2 -I/usr/local/software/jurecabooster/Stages/2018a/software/GMP/6.1.2-GCCcore-5.5.0/include -xCORE-AVX2 -O3 -fopenmp -std=c++11    -I/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/include       -xCORE-AVX2 -O3 -fopenmp -std=c++11 -L../../lib  -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -L/usr/local/software/jurecabooster/Stages/2018a/software/GMP/6.1.2-GCCcore-5.5.0/lib     -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -L../../other_libs/qdp-lapack/lib       -o chroma chroma.o -lchroma  -lqdp -lXPathReader -lxmlWriter -lqio -llime -L/homec/hbn28/hbn28e/Chroma-jureca-booster-avx2-soa2/local-icc/lib -lxml2 -lm -lqmp -lqmp -lintrin -lfiledb -lfilehash -lgmp      -lqphix_solver -lqphix_codegen  -lqdp-lapack      
../../lib/libchroma.a(syssolver_linop_clover_qphix_w.o): In function `QPhiX::ClovDslash<float, 8, 2, true>::completeFaceDir(int, float const*, float (*) [3][4][2][2], float const (*) [8][2][3][2][8], QPhiX::Types<float, 8, 2, true>::CloverBlock const*, double, int, int, int, bool)':
syssolver_linop_clover_qphix_w.cc:(.text._ZN5QPhiX10ClovDslashIfLi8ELi2ELb1EE15completeFaceDirEiPKfPA3_A4_A2_A2_fPA8_A2_A3_A2_A8_S2_PKNS_5TypesIfLi8ELi2ELb1EE11CloverBlockEdiiib[_ZN5QPhiX10ClovDslashIfLi8ELi2ELb1EE15completeFaceDirEiPKfPA3_A4_A2_A2_fPA8_A2_A3_A2_A8_S2_PKNS_5TypesIfLi8ELi2ELb1EE11CloverBlockEdiiib]+0x45e): undefined reference to `void QPhiX::face_clov_finish_dir_plus<float, 8, 2, true, true>(float const*, QPhiX::Types<float, 8, 2, true>::SU3MatrixBlock const*, QPhiX::Types<float, 8, 2, true>::FourSpinorBlock*, QPhiX::Types<float, 8, 2, true>::CloverBlock const*, int const*, int const*, int, int, int, int, float, unsigned int, int, float const (*) [2])'

More than a year ago, before the great refactoring of QPhiX, this would have compiled because the kernel code was header-only and there was a default definition which would just raise a runtime error. This way one could still compile in this fashion but just had to be careful not to call it with single precision calls. Now we have to decide what we want:

martin-ueding commented 6 years ago

I have tried to figure out where the float comes from. The template parameter is taken as WordType<T> in Chroma, where T is something I haven't figured out yet. Since I configure QDP++ with --enable-precision=double, all the types should be double I'd hope.

The inner solver precision is also set to double. So the mixed precision solver should not cause the reference to those float-kernels either.

Could you tell me where I would have to look?

bjoo commented 6 years ago

Hi Martin, Apologies for the late response. I have had some personal disruptions. I guess the issue is whether you want to use the mixed precision solver. For KNL we should have only SOA=4,8,16 for float and 2,4,8 for double. For AVX I thought the matrix was SOA=4,8 for float and SOA=2,4 for double. The mixed prec solver compiles all the time, and so may be causing you grief. However you should be able to use flags to control the inner precision type and soalen, so you should be able to do a mixedprec solver with double outer and double inner which should then build ok with soalen=2 (at least in principle)

Are you setting all of these flags?

Outer (outer precision is picked up from base precision) --enable-qphix-soalen=2

Inner --enable-qphix-solver-inner-type=double --enable-qphix-solver-inner-soalen=2

If this persists in being a problem, another workaround is to just not build the mixed prec solver. I can add an enable flag to do that.

martin-ueding commented 6 years ago

I have checked these compilation flags and I do believe that the solver should indeed be build with double-double precision. Still it requires the QPhiX kernels in float.

Having a configure flag for building this solver sounds like a good workaround. It would be better if no float-kernels were loaded, but that might be harder to track down.