ESCOMP / CAM

Community Atmosphere Model
77 stars 144 forks source link

Runtime error with NVHPC compiler for CAM6_3_125 #945

Open sjsprecious opened 10 months ago

sjsprecious commented 10 months ago

What happened?

After Rich Loft pointed out the issue of complex number accessors in the NVHPC compiler (https://github.com/ESCOMP/CAM/issues/881), I was able to build cam6_3_125 with the nvhpc/23.7 compiler on Derecho and his temporary workaround. However, the simulation failed with an unclear error message.

According to my debugging, the issue comes from these lines: https://github.com/ESCOMP/CAM/blob/cam_development/src/physics/cam/aerosol_optics_cam.F90#L38-L43.

If I change the current implementation from:

  type aero_props_t
     class(aerosol_properties), pointer :: obj => null()
  end type aero_props_t
  type aero_state_t
     class(aerosol_state), pointer :: obj => null()
  end type aero_state_t

to:

   type aero_props_t
      type(modal_aerosol_properties), pointer :: obj => null()
  end type aero_props_t
   type aero_state_t
      type(modal_aerosol_state), pointer :: obj => null()
  end type aero_state_t

Then the simulation built by the NVHPC compiler finishes successfully on Derecho.

I am not familiar with the modern Fortran features, so I do not know whether the issue here is a code bug or again a lack of support of modern Fortran standard by the NVHPC compiler. Maybe @fvitt could help provide more information here.

What are the steps to reproduce the bug?

What CAM tag were you using?

cam6_3_125

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

NVHPC

Path to a case directory, if applicable

/glade/derecho/scratch/sunjian/SMS_Ln9.f09_f09_mg17.F2000dev.derecho_nvhpc.cam-outfrq9s.20231227_132550_dfpib6

Will you be addressing this bug yourself?

No

Extra info

No response

fvitt commented 10 months ago

@sjsprecious The code in aerosol_optics_cam needs to able to work with any aerosol representation (modal, carma, etc.). Your proposed changes breaks the generality of the code in aerosol_optics_cam -- it will work only with the modal aerosol representation. Using abstract aerosol class pointers allows use to different representations of aerosols in aerosol_optics_cam. This code (without your changes) works without issues with modal and carma aerosol representations when compiled with intel, gnu, and nag compilers.

sjsprecious commented 5 months ago

@fvitt @cacraigucar I just recalled this pending issue for NVHPC compiler and I found out that this runtime error only occurred when I was using the FV dycore. I could use the same setups with SE dycore and cam6_3_154 tag, and the run finished successfully. Is the aerosol_optics_cam.F90 code only used in the FV dycore?

brian-eaton commented 5 months ago

Hi @sjsprecious. In anticipation of adding GPU tests to CAM's regression suite I wanted to check on the current status of running with nvhpc compilers on derecho. We are currently using nvhpc/23.7. I ran your tests without any code modifications in cam6_3_161 (latest tag). The FV test fails to build aerosol_optics_cam.F90 as discussed in issue #881. The SE test builds and runs successful as you reported.

There should be no dependence in the aerosol code on the dycore, and as far as I can tell from looking at the code there is none. Since we don't plan to support the FV dycore in CAM7 I think it's safe to move forward without a compiler fix for the FV configuration. However, when I turn on debug flags the SE test fails to build in the CLM component. To check whether CAM builds with debug flags I ran the test with a QPC6 (cam6 aquaplanet) compset and that test failed to build fvm_consistent_se_cslam.F90, which is an SE specific file.

Do we have access to updated nvhpc compilers on derecho?

sjsprecious commented 5 months ago

Hi @brian-eaton , thanks for your detailed reply. In the cam6_3_161 tag, we use the ccs_config_cesm0.0.106 tag and it loads the nvhpc/24.3 module on Derecho, which is the latest compiler we have on NCAR's machine.

I agree with you that the aerosol code should not depend on the dycore and the behavior of nvhpc compiler here is really weird. I did not try the DEBUG option with nvhpc compiler before, so I did not know that there was a problem even with the SE dycore. Can you post the QPC6 error here? Thanks.

brian-eaton commented 5 months ago

I'm afraid the error message was not informative:

nvfortran-Fatal-/glade/u/apps/common/23.08/spack/opt/spack/nvhpc/24.3/Linux_x86_64/24.3/compilers/bin/tools/fort1 TERMINATED by signal 11
gmake: *** [/glade/derecho/scratch/eaton/cime-tests/cam6_3_161-240606-084013/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.GC.203180/Tools/Makefile:1001: fvm_consistent_se_cslam.o] Error 2

Since we don't have much experience with the nvhpc compilers, it may be that the debug flag options being used need to be improved.

sjsprecious commented 5 months ago

Thanks @brian-eaton for posting the error. I could reproduce your error on Derecho with nvhpc/24.3 and debug option on.

I switched to nvhpc/23.7 and generated another test with the same setup. It gave a few more information shown below:

nvfortran-Fatal-/glade/u/apps/common/23.08/spack/opt/spack/nvhpc/23.7/Linux_x86_64/23.7/compilers/bin/tools/fort1 TERMINATED by signal 11
Arguments to /glade/u/apps/common/23.08/spack/opt/spack/nvhpc/23.7/Linux_x86_64/23.7/compilers/bin/tools/fort1
/glade/u/apps/common/23.08/spack/opt/spack/nvhpc/23.7/Linux_x86_64/23.7/compilers/bin/tools/fort1 /glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore/fvm_consistent_se_cslam.F90 -debug -x 120 0x8000 -opt 0 -terse 1 -inform warn -nostatic -x 19 0x400000 -quad -x 59 4 -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -y 129 2 -x 129 0x8000 -tp zen3 -x 57 0xfb0000 -x 58 0x78031040 -x 47 0x08 -x 48 4608 -x 49 0x100 -stdinc /glade/u/apps/common/23.08/spack/opt/spack/nvhpc/23.7/Linux_x86_64/23.7/compilers/include:/glade/u/apps/common/23.08/spack/opt/spack/nvhpc/23.7/Linux_x86_64/23.7/compilers/include-stdexec:/usr/lib64/gcc/x86_64-suse-linux/7/include:/usr/local/include:/usr/lib64/gcc/x86_64-suse-linux/7/include-fixed:/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/include:/usr/include -cmdline '+nvfortran /glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore/fvm_consistent_se_cslam.F90 -tp=zen3 -D__CRAY_X86_MILAN -D__CRAYXT_COMPUTE_LINUX_TARGET -c -I. -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/CDEPS/fox/include -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/CDEPS/dshr -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/include -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/nuopc/esmf/c1a1o1/include -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/finclude -I/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/pebp/include -I/glade/u/apps/derecho/23.09/spack/opt/spack/parallel-netcdf/1.12.3/cray-mpich/8.1.27/nvhpc/23.7/dxl5/include -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/include -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lnd/obj -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lnd/obj -I. -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/SourceMods/src.cam -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/SourceMods/src.cam -I/glade/derecho/scratch/sunjian/cam_gpu/src/unit_drivers -I/glade/derecho/scratch/sunjian/cam_gpu/src/unit_drivers/stub -I/glade/derecho/scratch/sunjian/cam_gpu/src/infrastructure -I/glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/pp_trop_mam4 -I/glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/modal_aero -I/glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/aerosol -I/glade/derecho/scratch/sunjian/cam_gpu/src/ionosphere -I/glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/mozart -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/Headers -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/GeosUtil -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/NcdfUtil -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Core -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Extensions -I/glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Interfaces/Shared -I/glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/utils -I/glade/derecho/scratch/sunjian/cam_gpu/src/physics/rrtmg -I/glade/derecho/scratch/sunjian/cam_gpu/src/physics/rrtmg/aer_src -I/glade/derecho/scratch/sunjian/cam_gpu/src/physics/clubb/src/CLUBB_core -I/glade/derecho/scratch/sunjian/cam_gpu/src/physics/pumas-frozen -I/glade/derecho/scratch/sunjian/cam_gpu/src/physics/cam -I/glade/derecho/scratch/sunjian/cam_gpu/src/atmos_phys/zhang_mcfarlane -I/glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se -I/glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore -I/glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/tests -I/glade/derecho/scratch/sunjian/cam_gpu/src/cpl/nuopc -I/glade/derecho/scratch/sunjian/cam_gpu/src/control -I/glade/derecho/scratch/sunjian/cam_gpu/src/utils -I/glade/derecho/scratch/sunjian/cam_gpu/src/utils/cam_ccpp -I/glade/derecho/scratch/sunjian/cam_gpu/src/atmos_phys/utilities -I/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lib/include -Mnofma -Mnonv-fma -i4 -gopt -time -Mextend -byteswapio -Mflushz -Kieee -O0 -g -traceback -Ktrap=fp -Mbounds -Kieee -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -I/glade/work/csgteam/spack-deployments/derecho/23.09/envs/build/opt/__spack_path_placeholder__/__spack_path_placeholder__/__spack/netcdf-c/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/c5du/include -I/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf-fortran/4.6.1/cray-mpich/8.1.27/nvhpc/23.7/mlml/include -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -DPLON=1 -DPLAT=1 -DNUM_COMP_INST_ATM=1 -DNUM_COMP_INST_LND=1 -DNUM_COMP_INST_OCN=1 -DNUM_COMP_INST_ICE=1 -DNUM_COMP_INST_GLC=1 -DNUM_COMP_INST_ROF=1 -DNUM_COMP_INST_WAV=1 -DNUM_COMP_INST_IAC=1 -DNUM_COMP_INST_ESP=1 -DCAM -D_WK_GRAD -DNP=4 -DHAVE_F2003_PTR_BND_REMAP -DFVM_TRACERS -D_MPI -DPLEV=32 -DPCNST=34 -DPCOLS=16 -DPSUBCOLS=1 -DN_RAD_CNST=30 -DPTRM=1 -DPTRN=1 -DPTRK=1 -DSPMD -DMODAL_AERO -DMODAL_AERO_4MODE -DCLUBB_SGS -DCLUBB_CAM -DNO_LAPACK_ISNAN -DCLUBB_REAL_TYPE=dp -DMODEL_ -DMODEL_CESM -DHEMCO_CESM -DUSE_REAL8 -DCNL -DCESMCOUPLED -DFORTRANUNDERSCORE -DNO_SHR_VMATH -DNO_R16 -DCPRPGI -DLINUX -DHAVE_GETTID -DDEBUG -DUSE_ESMF_LIB -DHAVE_MPI -DNUOPC_INTERFACE -DPIO2 -DHAVE_SLASHPROC -D_PNETCDF -DESMF_VERSION_MAJOR=8 -DESMF_VERSION_MINOR=6 -DATM_PRESENT -DICE_PRESENT -DLND_PRESENT -DOCN_PRESENT -DROF_PRESENT -DGLC_PRESENT -DWAV_PRESENT -DESP_PRESENT -DMED_PRESENT -DPIO2 -Mfree -DUSE_CONTIGUOUS= -I/glade/u/apps/derecho/23.09/spack/opt/spack/parallel-netcdf/1.12.3/cray-mpich/8.1.27/nvhpc/23.7/dxl5/include -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -I/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/pebp/include -I/glade/u/apps/derecho/23.09/spack/opt/spack/hdf5/1.12.2/cray-mpich/8.1.27/nvhpc/23.7/3bun/include -I/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -I/glade/u/apps/derecho/23.09/opt/view/include -I/opt/cray/pe/mpich/8.1.27/ofi/nvidia/20.7/include -I/opt/cray/pe/pmi/6.1.12/include -I/opt/cray/pe/pals/1.2.12/include' -def unix -def __unix -def __unix__ -def linux -def __linux -def __linux__ -def __NO_MATH_INLINES -def __LP64__ -def __x86_64 -def __x86_64__ -def __LONG_MAX__=9223372036854775807L -def '__SIZE_TYPE__=unsigned long int' -def '__PTRDIFF_TYPE__=long int' -def __amd64 -def __amd64__ -def __k8 -def __k8__ -def __MMX__ -def __SSE__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def __SSE4A__ -def __ABM__ -def __SSE4_1__ -def __SSE4_2__ -def __AVX__ -def __AVX2__ -def __F16C__ -def __FMA__ -def __XSAVE__ -def __XSAVEOPT__ -def __XSAVEC__ -def __XSAVES__ -def __POPCNT__ -def __SHA__ -def __AES__ -def __PCLMUL__ -def __CLFLUSHOPT__ -def __FSGSBASE__ -def __RDRND__ -def __BMI__ -def __BMI2__ -def __LZCNT__ -def __FXSR__ -def __MWAITX__ -def __CLZERO__ -def __PKU__ -def __VAES__ -def __VPCLMULQDQ__ -idir . -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/CDEPS/fox/include -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/CDEPS/dshr -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/include -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/nuopc/esmf/c1a1o1/include -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/finclude -idir /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/pebp/include -idir /glade/u/apps/derecho/23.09/spack/opt/spack/parallel-netcdf/1.12.3/cray-mpich/8.1.27/nvhpc/23.7/dxl5/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/nvhpc/mpich/debug/nothreads/nuopc/include -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lnd/obj -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lnd/obj -idir . -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/SourceMods/src.cam -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/SourceMods/src.cam -idir /glade/derecho/scratch/sunjian/cam_gpu/src/unit_drivers -idir /glade/derecho/scratch/sunjian/cam_gpu/src/unit_drivers/stub -idir /glade/derecho/scratch/sunjian/cam_gpu/src/infrastructure -idir /glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/pp_trop_mam4 -idir /glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/modal_aero -idir /glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/aerosol -idir /glade/derecho/scratch/sunjian/cam_gpu/src/ionosphere -idir /glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/mozart -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/Headers -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/GeosUtil -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Shared/NcdfUtil -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Core -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Extensions -idir /glade/derecho/scratch/sunjian/cam_gpu/src/hemco/HEMCO/src/Interfaces/Shared -idir /glade/derecho/scratch/sunjian/cam_gpu/src/chemistry/utils -idir /glade/derecho/scratch/sunjian/cam_gpu/src/physics/rrtmg -idir /glade/derecho/scratch/sunjian/cam_gpu/src/physics/rrtmg/aer_src -idir /glade/derecho/scratch/sunjian/cam_gpu/src/physics/clubb/src/CLUBB_core -idir /glade/derecho/scratch/sunjian/cam_gpu/src/physics/pumas-frozen -idir /glade/derecho/scratch/sunjian/cam_gpu/src/physics/cam -idir /glade/derecho/scratch/sunjian/cam_gpu/src/atmos_phys/zhang_mcfarlane -idir /glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se -idir /glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore -idir /glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/tests -idir /glade/derecho/scratch/sunjian/cam_gpu/src/cpl/nuopc -idir /glade/derecho/scratch/sunjian/cam_gpu/src/control -idir /glade/derecho/scratch/sunjian/cam_gpu/src/utils -idir /glade/derecho/scratch/sunjian/cam_gpu/src/utils/cam_ccpp -idir /glade/derecho/scratch/sunjian/cam_gpu/src/atmos_phys/utilities -idir /glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/bld/lib/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -idir /glade/work/csgteam/spack-deployments/derecho/23.09/envs/build/opt/__spack_path_placeholder__/__spack_path_placeholder__/__spack/netcdf-c/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/c5du/include -idir /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf-fortran/4.6.1/cray-mpich/8.1.27/nvhpc/23.7/mlml/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -idir /glade/u/apps/derecho/23.09/spack/opt/spack/parallel-netcdf/1.12.3/cray-mpich/8.1.27/nvhpc/23.7/dxl5/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/parallelio-2.6.2-wsmqqsn6khspxuqsuu4ndxdojyhi5f7w/include -idir /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/nvhpc/23.7/pebp/include -idir /glade/u/apps/derecho/23.09/spack/opt/spack/hdf5/1.12.2/cray-mpich/8.1.27/nvhpc/23.7/3bun/include -idir /glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/nvhpc-23.7/esmf-8.6.0-ra2gu3rr6jfachugfbjy4im557ezfgce/include -idir /glade/u/apps/derecho/23.09/opt/view/include -idir /opt/cray/pe/mpich/8.1.27/ofi/nvidia/20.7/include -idir /opt/cray/pe/pmi/6.1.12/include -idir /opt/cray/pe/pals/1.2.12/include -def __PGLLVM__ -def __NVCOMPILER_LLVM__ -def __extension__= -def __CRAY_X86_MILAN -def __CRAYXT_COMPUTE_LINUX_TARGET -def PLON=1 -def PLAT=1 -def NUM_COMP_INST_ATM=1 -def NUM_COMP_INST_LND=1 -def NUM_COMP_INST_OCN=1 -def NUM_COMP_INST_ICE=1 -def NUM_COMP_INST_GLC=1 -def NUM_COMP_INST_ROF=1 -def NUM_COMP_INST_WAV=1 -def NUM_COMP_INST_IAC=1 -def NUM_COMP_INST_ESP=1 -def CAM -def _WK_GRAD -def NP=4 -def HAVE_F2003_PTR_BND_REMAP -def FVM_TRACERS -def _MPI -def PLEV=32 -def PCNST=34 -def PCOLS=16 -def PSUBCOLS=1 -def N_RAD_CNST=30 -def PTRM=1 -def PTRN=1 -def PTRK=1 -def SPMD -def MODAL_AERO -def MODAL_AERO_4MODE -def CLUBB_SGS -def CLUBB_CAM -def NO_LAPACK_ISNAN -def CLUBB_REAL_TYPE=dp -def MODEL_ -def MODEL_CESM -def HEMCO_CESM -def USE_REAL8 -def CNL -def CESMCOUPLED -def FORTRANUNDERSCORE -def NO_SHR_VMATH -def NO_R16 -def CPRPGI -def LINUX -def HAVE_GETTID -def DEBUG -def USE_ESMF_LIB -def HAVE_MPI -def NUOPC_INTERFACE -def PIO2 -def HAVE_SLASHPROC -def _PNETCDF -def ESMF_VERSION_MAJOR=8 -def ESMF_VERSION_MINOR=6 -def ATM_PRESENT -def ICE_PRESENT -def LND_PRESENT -def OCN_PRESENT -def ROF_PRESENT -def GLC_PRESENT -def WAV_PRESENT -def ESP_PRESENT -def MED_PRESENT -def PIO2 -def USE_CONTIGUOUS= -preprocess -freeform -i4 -extend -vect 48 -x 54 1 -ieee 1 -x 68 0x1 -x 70 0x40000000 -x 70 0x40000000 -x 68 0x1 -x 124 1 -x 195 0x8000 -y 163 0xc0000000 -x 163 0x800000 -x 189 0x10 -x 49 0x1000 -x 125 2 -x 49 0x1000 -x 70 2 -freeform -stbfile /glade/derecho/scratch/sunjian/tmp/nvfortrangLVtsbSlP6Y0.stb -modexport /glade/derecho/scratch/sunjian/tmp/nvfortran2LVtIlmKFBsM.cmod -modindex /glade/derecho/scratch/sunjian/tmp/nvfortranMLVtYNTyLo_f.cmdx -cci /glade/derecho/scratch/sunjian/tmp/nvfortranwLVtc5eOKt4l.cci -output /glade/derecho/scratch/sunjian/tmp/nvfortranMLVtYIa1_q3i.ilm
gmake: *** [/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/Tools/Makefile:1001: fvm_consistent_se_cslam.o] Error 127
gmake: *** Waiting for unfinished jobs....
NVFORTRAN-S-0000-Internal compiler error. memsym_of_ast:unexp.ast    3424  (/glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore/fvm_mapping.F90: 1224)
NVFORTRAN-S-0000-Internal compiler error. memsym_of_ast:unexp.ast    3424  (/glade/derecho/scratch/sunjian/cam_gpu/src/dynamics/se/dycore/fvm_mapping.F90: 1224)
  0 inform,   0 warnings,   2 severes, 0 fatal for get_fvm_recons
gmake: *** [/glade/derecho/scratch/sunjian/SMS_D_Ln9.ne30pg3_ne30pg3_mg17.QPC6.derecho_nvhpc.cam-outfrq9s.20240606_123033_fbc8en/Tools/Makefile:1001: fvm_mapping.o] Error 2

It pointed to a specific function here: https://github.com/ESCOMP/CAM/blob/cam_development/src/dynamics/se/dycore/fvm_mapping.F90#L1224.

If the debug option works for other compilers like Intel or GNU, I think this could be unfortunately another NVHPC compiler bug.

brian-eaton commented 5 months ago

I have run tests to verify that both the intel and the gnu compilers can successfully build and run the F2000dev test with debugging enabled. This result holds for both the FV and the SE dycores.

sjsprecious commented 5 months ago

Thanks @brian-eaton for doing those additional checks. I think it further confirms that the issue comes from the NVIDIA side.