Open ndkeen opened 3 weeks ago
I build this myself and I don't think the output above is relevant (it's just related to the parallel build getting killed as far as I can tell). The relevant output is:
NVFORTRAN-S-0038-Symbol, topographic_wave_drag, has not been explicitly declared (/pscratch/sd/x/xylar/e3sm_scratch/pm-gpu/SMS_Ld1.T62_oEC60to30v3.CMPASO-NYF.pm-gpu_nvidiagpu.20240613_020012_785h6a/bld/cmake-bld/core_ocean/shared/mpas_ocn_diagnostics_variables.f90: 1023)
This appear to be caused by https://github.com/E3SM-Project/E3SM/pull/6310, which removed the topographic_wave_drag
field but missed the OpenACC directive on that line.
After fixing the above, I'm now seeing:
NVFORTRAN-S-1061-Procedures called in a compute region must have acc routine information - ocn_subgrid_ssh_lookup (/pscratch/sd/x/xylar/e3sm_scratch/pm-gpu/SMS_Ld1.T62_oEC60to30v3.CMPASO-NYF.pm-gpu_nvidiagpu.20240613_024017_46es2q/bld/cmake-bld/core_ocean/shared/mpas_ocn_diagnostics.f90: 2307)
/global/common/software/nersc/pm-2022q4/spack/linux-sles15-zen/cmake-3.24.3-k5msymx/bin/cmake -E cmake_copy_f90_mod mpas-framework/src/ocn_tracer_advection_mono.mod mpas-framework/src/CMakeFiles/ocn.dir/ocn_tracer_advection_mono.mod.stamp NVHPC
ocn_diagnostic_solve_z_coordinates:
2307, Accelerator restriction: call to 'ocn_subgrid_ssh_lookup' with no acc routine information
This next issue seems to have been introduced by https://github.com/E3SM-Project/E3SM/pull/6288, and it's going to be more of a challenge to address. It seems like it's caused by calling ocn_subgrid_ssh_lookup
within an OpenACC loop without having added the required directives.
@sbrus89, I made #6471 to fix the first issue. Could you make a PR to fix the second one?
It seems like separate PRs probably make sense to fix these issues because they're unrelated to each other but we won't be able to test them on their own because the test isn't currently compiling.
@ndkeen, This appears to be fixed now: https://my.cdash.org/tests/175231189
With test
SMS_Ld1.T62_oEC60to30v3.CMPASO-NYF.pm-gpu_nvidiagpu
it has been failing for a while now. I think I mentioned this to @jonbob who said the fail dates matched a PR that recently went in. I thought I had made an issue, but maybe not.