PrincetonUniversity / athena

Athena++ radiation GRMHD code and adaptive mesh refinement (AMR) framework
https://www.athena-astro.app
BSD 3-Clause "New" or "Revised" License
226 stars 122 forks source link

Trigger PR test on new Jenkins server and fix all violations #518

Closed changgoo closed 1 year ago

changgoo commented 1 year ago

This PR tests the automatic Jenkins PR trigger.

Prerequisite checklist

changgoo commented 1 year ago

retest this on stellar

changgoo commented 1 year ago

@tomidakn

The pgen_compile regression test failed with

src/multigrid/multigrid_driver.cpp:876:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
    int i = loc.lx1 + ngh;
        ~   ~~~~~~~~^~~~~
src/multigrid/multigrid_driver.cpp:877:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
    int j = loc.lx2 + ngh;
        ~   ~~~~~~~~^~~~~
src/multigrid/multigrid_driver.cpp:878:21: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
    int k = loc.lx3 + ngh;
        ~   ~~~~~~~~^~~~~
felker commented 1 year ago
icpx  -O3 -std=c++11 -ipo -xhost -qopenmp-simd  -diag-disable 3180 -Wall -Wextra -Werror -pedantic -pedantic-errors -Wno-unused-private-field -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-function -Wshorten-64-to-32 -c src/multigrid/mgbval_zerofixed.cpp -o /scratch/gpfs/changgoo/jenkins/workspace/tigris/athena-PR/tst/regression/obj/mgbval_zerofixed.o
...

void MeshRefinement::ProlongateInternalField(
                     ^
src/mesh/mesh_refinement.cpp:831:22: error: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
felker commented 1 year ago

@changgoo what is the icpx --version on Stellar that you were using from intel/2021.1.2 ? I am using Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)

icpc still compiles fine with all the errors and warning flags enabled.

felker commented 1 year ago

I assume the change I just made in bb702bfc fixes a typo, right @tomo-ono ?

felker commented 1 year ago

@tomidakn do you know if this is a typo or should I just delete the first line? https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/pgen/strat.cpp#L224-L225

changgoo commented 1 year ago

@changgoo what is the icpx --version on Stellar that you were using from intel/2021.1.2 ? I am using Intel(R) oneAPI DPC++/C++ Compiler 2023.1.0 (2023.1.0.20230320)

icpc still compiles fine with all the errors and warning flags enabled.

The version I'm using is

Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.1.2/linux/bin
changgoo commented 1 year ago

On stellar, there is one version latest than that:

Intel(R) oneAPI DPC++/C++ Compiler 2022.2.0 (2022.2.0.20220730)
tomo-ono commented 1 year ago

I assume the change I just made in bb702bf fixes a typo, right @tomo-ono ?

@felker Yes. I think so.

felker commented 1 year ago

@changgoo can you try the newer one? The 2023 release I was using on a Cascade Lake CPU did not throw an error for the #pragma unroll

After removing all std::isnan() function calls, the only error I had left for my compiler version was:

src/hydro/rsolvers/hydro/hllc.cpp:32:13: error: loop not distributed: failed explicitly specified loop distribution [-Werror,-Wpass-failed]
void Hydro::RiemannSolver(const int k, const int j, const int il, const int iu,

Which was fixed by commenting out https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/hydro/rsolvers/hydro/hllc.cpp#L50

We could also/instead suppress -Wpass-failed errors.

felker commented 1 year ago

@tomidakn @c-white can you double check the changes I made to the GR EOS and AMR cost calculations? compiler identified them as unused but maybe that was not intentional

tomidakn commented 1 year ago

@tomidakn do you know if this is a typo or should I just delete the first line?

https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/pgen/strat.cpp#L224-L225

I checked the history, and I believe this is a typo and you can simply remove the first line. In the original implementation in the old Athena, it was like the first line. However, this is incompatible with the isothermal EOS and I guess Ji-Min changed it to the second line.

felker commented 1 year ago

https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html

Important Compiler Options Mapping

On stellar, there is one version latest than that:

Intel(R) oneAPI DPC++/C++ Compiler 2022.2.0 (2022.2.0.20220730)

What is the module for this?

felker commented 1 year ago

Or rather, do all of these modules exist? I am just guessing and assume that the HDF5 one is wrong:

module load anaconda3/2020.11 intel/2022.2.0 intel-mpi/intel/2022.2.0 hdf5/intel-2022.2.0.10.6 fftw/intel-2022.0/3.3.9
changgoo commented 1 year ago

I updated module load to this

module load anaconda3/2023.3 intel/2022.2.0 intel-mpi/intel/2021.7.0 hdf5/intel-2021.1/1.10.6 fftw/intel-2021.1/3.3.9

hdf5 and fftw modules are not compiled with the latest version of intel compiler.

felker commented 1 year ago

Regarding the problem with std::isnan(), ICX by default uses a floating point mode similar to GCC's fast mode, and it assumes that NaNs and Infs are not operands or results of math functions. isnan() is always optimized out to 0.

This was also true in the ICC Intel Classic Compiler (but not true for Clang). So all calls of the function throw this warning:

src/fft/turbulence.cpp:440:7: error: explicit comparison with NaN in fast floating point mode [-Werror,-Wtautological-constant-compare]
  if (std::isnan(s)) std::cout << "[perturb]: s is NaN!" << std::endl;
      ^~~~~~~~~~~~~

Athena++ only calls the function in turbulence.cpp and gr_torus.cpp

See discussion here: https://discourse.llvm.org/t/should-isnan-be-optimized-out-in-fast-math-mode/58888/22 and here: https://github.com/dealii/dealii/issues/14693

c-white commented 1 year ago

@felker The GR EOS variable was indeed unnecessary -- it was only used for MHD.

felker commented 1 year ago

@tomidakn is it best to initialize these variables to 0 or 1?

    int nbx, nby, nbz;
    for (int l = 0; l < 20; l++) {
      if (size_.nx1%(1<<l) == 0 && size_.nx2%(1<<l) == 0 && size_.nx3%(1<<l) == 0) {
        nbx=size_.nx1/(1<<l), nby=size_.nx2/(1<<l), nbz=size_.nx3/(1<<l);
        nlevel_=l+1;
      }
    }
    int nmaxr = std::max(nbx, std::max(nby, nbz));
    // int nminr=std::min(nbx, std::min(nby, nbz)); // unused variable
    if (nmaxr != 1 && Globals::my_rank == 0) {
      std::cout
          << "### Warning in Multigrid::Multigrid" << std::endl
          << "The root grid can not be reduced to a single cell." << std::endl
          << "Multigrid should still work, but this is not the"
          << " most efficient configuration"
          << " as the coarsest level is not solved exactly but iteratively." << std::endl;
    }
felker commented 1 year ago

I suppose it will always perform the first l=0 iteration of the loop, but initializing the 3x variables to 1 or nx1, nx2, nx3 makes the most sense?

felker commented 1 year ago

@changgoo is the job stuck in the queue or just taking a long time to run? Now I see the downside of the Jenkins server since I am on the other side of the Princeton wall :)

We might want to fiddle around with the ICX compiler flags more at some point, and the pragmas. I am especially concerned about -Wno-pass-failed--- we theoretically would want to know when the oneAPI compiler is having trouble unrolling, etc.

If you move #pragma distribute_point from outside the HLLC loop to somewhere in the middle, it works fine.

felker commented 1 year ago

Never mind, just failed on some SR tests with ICX:

Results:
    sr.hydro_convergence: failed; time elapsed: 36.8 s
    sr.hydro_shocks_hllc: failed; time elapsed: 38.1 s
    sr.hydro_shocks_hlle: failed; time elapsed: 36 s
...
    sr.mhd_shocks_hlle: failed; time elapsed: 41.1 s
felker commented 1 year ago

@c-white mind taking a look? I am guessing something is not safely vectorizing in SR with the new Intel compilers?

wave_flag = 4
epsilons low/high:
[3.11071510e-06 1.03690550e-06 3.15312430e-07 4.10445107e-08
 6.84075176e-09]

[3.68679283e-06 1.22893101e-06 3.73704680e-07 4.86453980e-08
 8.10756633e-09]
epsilon low/high: 1.4732871267861087, 1.7461271479208567
epsilon_high / epsilon_low = 1.1851913426610416 is greater than
(float(res_low) / float(res_high))**cutoff = 0.02368307135172497

vs g++

wave_flag = 4
epsilons low/high:
[6.09570663e-09 2.03190472e-09 6.17883221e-10 8.04298503e-11
 1.34049825e-11]

[7.10139152e-11 2.36719058e-11 7.19814258e-12 9.37058807e-13
 1.56175311e-13]
epsilon low/high: 0.0028870297711365, 3.363346847291481e-05
felker commented 1 year ago

Indeed, removing -qopenmp-simd fixes it.

tomidakn commented 1 year ago

I suppose it will always perform the first l=0 iteration of the loop, but initializing the 3x variables to 1 or nx1, nx2, nx3 makes the most sense?

They can be initialized with 1 or size_.nx1 etc.

felker commented 1 year ago

@c-white issue is with this pragma on the Cascade Lake CPU I was using: https://github.com/PrincetonUniversity/athena/blob/053e833e1a47f954fbc113941b19ef5536528660/src/hydro/rsolvers/hydro/hllc_rel.cpp#L151 (and presumably the same in HLLE for SR hydro and MHD). Note, SIMD_LENGTH=8 on this system. Also fails with fixed SIMD_LENGTH=4 or 2, or even just #pragma omp simd. Might be specific to this compiler version and/or hardware revision.