Closed huangxs48 closed 4 years ago
I got the same errors (for my hydro problem and -ipo option) with Intel/18.0.1 but the erros disappear with Intel/18.0.3. Have you tried with intel/20.0 and see whether this would solve your problem?
@lucyundead I tried intel/20.0 (paired with intelmpi/20.0) on our school's cluster before, it crashed the same way. But it sounds promising! So which MPI library and compiler were you using?
@lucyundead I tried intel/20.0 (paired with intelmpi/20.0) on our school's cluster before, it crashed the same way. But it sounds promising! So which MPI library and compiler were you using?
I use mpiicpc of intel/18.0.3 and it works fine with -ipo option. The cpu we use is Xeon gold 6230. I also tested the case without -ipo and it seems this option does not affect the efficiency of the code.
I just tested on Stampede2 with a hydro shock tube, and I can confirm there's a bug that seems to initially manifest near block boundaries. It's either an effectively unfixable bug in the code that only appears with some compilers (the latest development branch has the same problem), or a bug in the compiler itself. Either way, given that one can't usually choose the compiler on a cluster, I think the only viable workaround is to remove -ipo
from the Makefile.
@lucyundead Thank you! Unfortunately the clusters I'm using don't have intel/18.0.3, but good to know some version of compiler works! For the problem I tested (a few simple MHD problems), using -ipo seems improve the speed only by a factor of ~1.5-2 on the same machine, so maybe not much of a big deal.
I just tested on Stampede2 with a hydro shock tube, and I can confirm there's a bug that seems to initially manifest near block boundaries. It's either an effectively unfixable bug in the code that only appears with some compilers (the latest development branch has the same problem), or a bug in the compiler itself. Either way, given that one can't usually choose the compiler on a cluster, I think the only viable workaround is to remove
-ipo
from the Makefile.
Good to get confirmed, the problems I tried usually crash when the main.cpp calls NewBlockTimeStep(). But memory bugs are subtle, it may alter with different configuration. I guess I'll just drop the -ipo option.
I tried to compile with -g and stepped through till it crashes. Indeed there are some incorrect optimizations happening. Also tried -ip on several versions and all works well. Perhaps we can keep -ip instead.
Hi,
This might be an issue should go to the computation platforms consult teams, but I got similar behavior on different clusters, so I’m wondering it might be related to the code. (And it looks like related to Issue#55)
I'm trying to running the shock tube test that compiled with --cxx=icc-phi option (also with -hdf5). However, the interprocedural optimization (-ipo) flag in the default configuration seems to cause trouble. If I turn on the -ipo flag, the built binary file throws segmentation fault before entering the first calculation cycle. Keeping everything else the same but deleting the -ipo flag will build a binary file that works normally.
I tested shock_tube.cpp (in 1D and 2D Cartesian)and disk.cpp (in 1D and 2D cylindrical) on Stampede2 and another cluster’s KNL nodes (with xMIC-AVX512 architect, intel/18.0 and intel/20.0), on Stampede2’s SkyLake node (with xCORE-AVX512 architect, intel/18.0), and got the same seg fault at the initialization stage. I also tested them on a cluster’s Cascade Lake node (with xCORE-AVX512 architect, intel/18.0), it can build a binary that won’t crash, but the tilmestep drops to a small value after the initial cycle.
The code will run without crashing for above configurations and problem generators if I use periodic boundary, but any other default boundary condition won’t work. The seg fault (unfortunately but classically) goes away if I turn on the debugging option. The MHD problems are fine with -ipo. It looks like a subtle issue that only appear with hydro problems+Intel's -ipo optimization.
Thank you!