Closed kostrzewa closed 3 years ago
Yes, I face similar problems for the case using IntelMPI19 on both machines. I switched back to IntelMPI18, which seems to be fine.
I see, thanks! So we get consistent behaviour. I switched back all the way to stage-2018a on Juwels (ICC 2018 / Intel MPI 2018) and this seems to work as expected. Did you do the same or did you use ICC 2019 and IntelMPI 2018?
On SuperMUC I am currently using icc 18.0.5. I checked on JUWELS and in the last runs I was using the modules "ParaStationMPI/5.2.1-1" with "Intel/2019.0.117-GCC-7.3.0" (which is icc 19). I initially run with Intel 18 (icc + IntelMPI) due to the scaling problems of IntelMPI19 on JUWELS at the beginning of 2019 (not sure if they are solved). I think on SuperMUC I directly switch to Intel 2018, however I would like to re-run some benchmarks using the updated environment with the energy optimization options (maybe I can try to revisit IntelMPI19 at this time).
I recently run some checks on Juwels with icc/2019.3.199-GCC-8.3.0. By reducing the compiler optimization to -O2, DDalphaAMG do not exit with signal 11 (however with -O3 the issue is still there).
Interesting, thanks for the update! Might be some instruction reordering thing then...
@Finkenrath just a follow-up question: did you compile just DDalphaAMG with -O2
or both the library and tmLQCD?
I tested it only with DDalphaAMG executable and I didn't run tmLQCD yet. If I have some results on that I will let you know.
I see, sounds good, thanks.
@sbacchio @Finkenrath I'm putting the finishing touches on the code to run on Juwels / SuperMUC-NG for new production / continuation. With the current software stage (not sure about other stages in combination with this particular version yet), the DDalphaAMG setup in the HMC (triggered by a light monomial) exits with signal 11. Did you observe anything similar on SuperMUC-NG?