Closed changgoo closed 1 year ago
retest this on stellar
@tomidakn
The pgen_compile regression test failed with
src/multigrid/multigrid_driver.cpp:876:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
int i = loc.lx1 + ngh;
~ ~~~~~~~~^~~~~
src/multigrid/multigrid_driver.cpp:877:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
int j = loc.lx2 + ngh;
~ ~~~~~~~~^~~~~
src/multigrid/multigrid_driver.cpp:878:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
int k = loc.lx3 + ngh;
~ ~~~~~~~~^~~~~
icpx -O3 -std=c++11 -ipo -xhost -qopenmp-simd -diag-disable 3180 -Wall -Wextra -Werror -pedantic -pedantic-errors -Wno-unused-private-field -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-function -Wshorten-64-to-32 -c src/multigrid/mgbval_zerofixed.cpp -o /scratch/gpfs/changgoo/jenkins/workspace/tigris/athena-PR/tst/regression/obj/mgbval_zerofixed.o
...
void MeshRefinement::ProlongateInternalField(
^
src/mesh/mesh_refinement.cpp:831:22: error: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
@changgoo what is the icpx --version
on Stellar that you were using from intel/2021.1.2
? I am using
Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)
icpc
still compiles fine with all the errors and warning flags enabled.
I assume the change I just made in bb702bfc fixes a typo, right @tomo-ono ?
@tomidakn do you know if this is a typo or should I just delete the first line? https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/pgen/strat.cpp#L224-L225
@changgoo what is the
icpx --version
on Stellar that you were using fromintel/2021.1.2
? I am usingIntel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)
icpc
still compiles fine with all the errors and warning flags enabled.
The version I'm using is
Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.1.2/linux/bin
On stellar, there is one version latest than that:
Intel(R) oneAPI DPC++/C++ Compiler 2022.2.0 (2022.2.0.20220730)
I assume the change I just made in bb702bf fixes a typo, right @tomo-ono ?
@felker Yes. I think so.
@changgoo can you try the newer one? The 2023 release I was using on a Cascade Lake CPU did not throw an error for the #pragma unroll
After removing all std::isnan()
function calls, the only error I had left for my compiler version was:
src/hydro/rsolvers/hydro/hllc.cpp:32:13: error: loop not distributed: failed explicitly specified loop distribution [-Werror,-Wpass-failed]
void Hydro::RiemannSolver(const int k, const int j, const int il, const int iu,
Which was fixed by commenting out https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/hydro/rsolvers/hydro/hllc.cpp#L50
We could also/instead suppress -Wpass-failed
errors.
icx,icpx
in https://github.com/PrincetonUniversity/athena/blob/master/tst/ci/set_warning_cflag.sh@tomidakn @c-white can you double check the changes I made to the GR EOS and AMR cost calculations? compiler identified them as unused but maybe that was not intentional
@tomidakn do you know if this is a typo or should I just delete the first line?
I checked the history, and I believe this is a typo and you can simply remove the first line. In the original implementation in the old Athena, it was like the first line. However, this is incompatible with the isothermal EOS and I guess Ji-Min changed it to the second line.
Important Compiler Options Mapping
On stellar, there is one version latest than that:
Intel(R) oneAPI DPC++/C++ Compiler 2022.2.0 (2022.2.0.20220730)
What is the module for this?
Or rather, do all of these modules exist? I am just guessing and assume that the HDF5 one is wrong:
module load anaconda3/2020.11 intel/2022.2.0 intel-mpi/intel/2022.2.0 hdf5/intel-2022.2.0.10.6 fftw/intel-2022.0/3.3.9
I updated module load to this
module load anaconda3/2023.3 intel/2022.2.0 intel-mpi/intel/2021.7.0 hdf5/intel-2021.1/1.10.6 fftw/intel-2021.1/3.3.9
hdf5 and fftw modules are not compiled with the latest version of intel compiler.
Regarding the problem with std::isnan()
, ICX by default uses a floating point mode similar to GCC's fast mode, and it assumes that NaNs and Infs are not operands or results of math functions. isnan()
is always optimized out to 0.
This was also true in the ICC Intel Classic Compiler (but not true for Clang). So all calls of the function throw this warning:
src/fft/turbulence.cpp:440:7: error: explicit comparison with NaN in fast floating point mode [-Werror,-Wtautological-constant-compare]
if (std::isnan(s)) std::cout << "[perturb]: s is NaN!" << std::endl;
^~~~~~~~~~~~~
Athena++ only calls the function in turbulence.cpp
and gr_torus.cpp
See discussion here: https://discourse.llvm.org/t/should-isnan-be-optimized-out-in-fast-math-mode/58888/22 and here: https://github.com/dealii/dealii/issues/14693
@felker The GR EOS variable was indeed unnecessary -- it was only used for MHD.
@tomidakn is it best to initialize these variables to 0 or 1?
int nbx, nby, nbz;
for (int l = 0; l < 20; l++) {
if (size_.nx1%(1<<l) == 0 && size_.nx2%(1<<l) == 0 && size_.nx3%(1<<l) == 0) {
nbx=size_.nx1/(1<<l), nby=size_.nx2/(1<<l), nbz=size_.nx3/(1<<l);
nlevel_=l+1;
}
}
int nmaxr = std::max(nbx, std::max(nby, nbz));
// int nminr=std::min(nbx, std::min(nby, nbz)); // unused variable
if (nmaxr != 1 && Globals::my_rank == 0) {
std::cout
<< "### Warning in Multigrid::Multigrid" << std::endl
<< "The root grid can not be reduced to a single cell." << std::endl
<< "Multigrid should still work, but this is not the"
<< " most efficient configuration"
<< " as the coarsest level is not solved exactly but iteratively." << std::endl;
}
I suppose it will always perform the first l=0
iteration of the loop, but initializing the 3x variables to 1 or nx1, nx2, nx3
makes the most sense?
@changgoo is the job stuck in the queue or just taking a long time to run? Now I see the downside of the Jenkins server since I am on the other side of the Princeton wall :)
We might want to fiddle around with the ICX compiler flags more at some point, and the pragmas. I am especially concerned about -Wno-pass-failed
--- we theoretically would want to know when the oneAPI compiler is having trouble unrolling, etc.
If you move #pragma distribute_point
from outside the HLLC loop to somewhere in the middle, it works fine.
Never mind, just failed on some SR tests with ICX:
Results:
sr.hydro_convergence: failed; time elapsed: 36.8 s
sr.hydro_shocks_hllc: failed; time elapsed: 38.1 s
sr.hydro_shocks_hlle: failed; time elapsed: 36 s
...
sr.mhd_shocks_hlle: failed; time elapsed: 41.1 s
@c-white mind taking a look? I am guessing something is not safely vectorizing in SR with the new Intel compilers?
wave_flag = 4
epsilons low/high:
[3.11071510e-06 1.03690550e-06 3.15312430e-07 4.10445107e-08
6.84075176e-09]
[3.68679283e-06 1.22893101e-06 3.73704680e-07 4.86453980e-08
8.10756633e-09]
epsilon low/high: 1.4732871267861087, 1.7461271479208567
epsilon_high / epsilon_low = 1.1851913426610416 is greater than
(float(res_low) / float(res_high))**cutoff = 0.02368307135172497
vs g++
wave_flag = 4
epsilons low/high:
[6.09570663e-09 2.03190472e-09 6.17883221e-10 8.04298503e-11
1.34049825e-11]
[7.10139152e-11 2.36719058e-11 7.19814258e-12 9.37058807e-13
1.56175311e-13]
epsilon low/high: 0.0028870297711365, 3.363346847291481e-05
Indeed, removing -qopenmp-simd
fixes it.
I suppose it will always perform the first
l=0
iteration of the loop, but initializing the 3x variables to 1 ornx1, nx2, nx3
makes the most sense?
They can be initialized with 1 or size_.nx1 etc.
@c-white issue is with this pragma on the Cascade Lake CPU I was using: https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/hydro/rsolvers/hydro/hllc_rel.cpp#L151
(and presumably the same in HLLE for SR hydro and MHD).
Note, SIMD_LENGTH=8
on this system. Also fails with fixed SIMD_LENGTH=4
or 2, or even just #pragma omp simd
. Might be specific to this compiler version and/or hardware revision.
This PR tests the automatic Jenkins PR trigger.
Prerequisite checklist