Closed bjoo closed 2 years ago
NB: I was trying this on our cluster, and with the intel compiler, using a double precision build with --enable-sse2 --enable-sse3 in QDP++ and in Chroma Dslash, several regressions go wrong, I am investigating this with enabling/disabling sse, OpenMP, in QDP++ and Chroma to track it.
t_leapfrog FAIL t_leapfrog.prec_1flav_clover.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-rel-cg-multiprec.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-cg-lf-clover.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-richardson-multiprec.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-rel-bicgstab-multiprec.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-rel-ibicgstab-multiprec.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout-ibicgstab.candidate.xml t_leapfrog FAIL t_leapfrog.sts_min_norm_2_dtau.candidate.xml t_leapfrog FAIL t_leapfrog.tst_min_norm_2_dtau.candidate.xml t_leapfrog FAIL t_leapfrog.unprec_clover.candidate.xml t_leapfrog FAIL t_leapfrog.prec_2flav_clover.candidate.xml t_leapfrog FAIL t_leapfrog.prec_2flav_clover.sfnonpt.candidate.xml t_leapfrog FAIL t_leapfrog.lw.sfnonpt.candidate.xml t_leapfrog FAIL t_leapfrog.prec_2flav_clover.ee_oo_candidate.xml t_leapfrog FAIL t_leapfrog.rect_gaugeact.candidate.xml t_leapfrog FAIL t_leapfrog.rect_gaugeact_1.candidate.xml t_leapfrog FAIL t_leapfrog.rect_gaugeact_c1t0.candidate.xml t_leapfrog FAIL t_leapfrog.rect_gaugeact_omit2linkT.candidate.xml t_leapfrog FAIL t_leapfrog.rect_gaugeact_aniso.candidate.xml t_leapfrog FAIL t_leapfrog.two_plaq_spatial_gaugeact.candidate.xml t_leapfrog FAIL t_leapfrog.aniso_spectrum.candidate.xml t_leapfrog FAIL t_leapfrog.prec_clover_stout.candidate.xml t_leapfrog FAIL t_leapfrog.prec_slrc.candidate.xml t_leapfrog FAIL t_leapfrog.prec_slrc.sfnonpt.candidate.xml t_leapfrog FAIL t_leapfrog.aniso_sym_spatial_plus_temporal.log.xml t_leapfrog FAIL t_leapfrog.aniso_sym_spatial.candidate.xml t_leapfrog FAIL t_leapfrog.aniso_sym_temporal.candidate.xml purgaug FAIL purgaug.candidate.xml purgaug FAIL purgaug.sfnonpt.candidate.xml purgaug FAIL purgaug.2+2.candidate.xml purgaug FAIL purgaug.2+2.1loop.candidate.xml
NB: The various propagators for the smearing combinations all PASS so likely the issue is in the Force term / Gauge term (similar to what Brendan Fahy spotted/reported).
OK. Disabling sse2 and sse3 in QDP++ only (leave it on in Chroma) seems good (all those tests now pass)
OK, I've made fixed some of this in QDP++ commit dfb24b4fe525bbe2b4e71dd71ba01bd93dfa66ac Now parscalar-single-intel and parscalar-single-intel double regressions pass.
The issue had to do with an initialization of the __m128d type using a union in this fashion.
typedef union {
__m128d v;
double d[2];
} VD;
One could then do
VD x = { a, b };
however in the CCMUL and CCMADD macros something went awry. I changed the intialization instead to just
__m128d x = _mm_set_pd( b, a );
and of course changed instances of 'x.v' to just 'x'
I am not sure why this generated the wrong code originally, since many of the tests still use this kind of initialization. NB: This has the potential to hurt expressions of the form
adj(m1)*adj(m2)
like seen by Brendan Fahy, so this fix may solve his issues too.
I have had several issue reports with Intel compiler, probably as relates to QDP++ under Chroma (perhaps I should move / cross list this issue to a QDP++ tracker if we ever get one):
Brendan Fahy reported this in March:
This does not yet implicate the high optimization level (sent query to Brendan) but Jie also had a similar issue which we did track to using -O3. This made me think that Brendan's issue may have also been due to -O3 vs -O2.
Finally, Will Detmold reported incorrect solver convergence (and in fact nonconvergence) on Edison at NERSC, using a configuration created on Intrepid (Argonne BG/P). He was trying to continue the run. He tried a variety of optimization combinations:
I would add to this that I suspect the issue is ina .cc file in QDP++, since if it was in a .h file, then the (op Chroma) would likely also be bad.
However, Will's test hopefully used a double prec build. I don't know if Jie and Brendan saw this issue in double prec or not.