E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
346 stars 353 forks source link

AMD compiler: mozart/UCI_cld_sub_mod.f90 parsing errors #5692

Open sarats opened 1 year ago

sarats commented 1 year ago

On Frontier, the AMD compiler is having trouble parsing mozart/UCI_cld_sub_mod.f90.

Compiler version

$ amdflang --version
AMD flang-new version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.3 23045 a29fe425c7b0e5aba97ed2f95f61fd5ecba68aed)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.4.3/llvm/bin
[sarat@login03 atm ]$ /opt/cray/pe/craype/2.7.20/bin/ftn -DBIT64 -DCAM -DCNL -DCO2A -DCPRAMD -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DHOMME_ENABLE_COMPOSE -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DMODEL_THETA_L -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=12 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPMD -DUSE_COSP -DYES3DVAL=0 -D_MPI -D_PNETCDF -D_PRIM -Dsam1mom -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/cmake-bld/cmake/atm/yakl -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/cmake/atm/. -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/amdclang/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/amd/4.3/include -I/opt/cray/pe/mpich/8.1.23/ofi/amd/5.0/include -I/opt/cray/pe/parallel-netcdf/1.12.3.1/amd/4.3/include -I/lustre/orion/cli115/proj-shared/testing/S/J/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/SourceMods/src.eam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/crm -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/crm/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/pp_none -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/bulk_aero -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/aerosol -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/utils -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/cam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/p3/eam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/dynamics/se -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/share -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/theta-l -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/theta-l/share -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/share/compose -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/cpl -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/control -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/utils -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/lnd/obj -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/src -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/src/extensions -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/external -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/cmake/atm/../../../externals/YAKL -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/. -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rte -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rte/kernels -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp/kernels -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rte -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/cloud_optics -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/fluxes_byband -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples/all-sky -isystem /lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/amdclang/mpich/nodebug/nothreads/mct/include  -c /lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90 -o CMakeFiles/atm.dir/__/__/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90.o -craype-verbose
amdflang -march=znver3 -D__CRAY_X86_TRENTO -D__CRAYXT_COMPUTE_LINUX_TARGET -DBIT64 -DCAM -DCNL -DCO2A -DCPRAMD -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DHOMME_ENABLE_COMPOSE -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DMODEL_THETA_L -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=12 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPMD -DUSE_COSP -DYES3DVAL=0 -D_MPI -D_PNETCDF -D_PRIM -Dsam1mom -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/cmake-bld/cmake/atm/yakl -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/cmake/atm/. -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/amdclang/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.9.0.1/amd/4.3/include -I/opt/cray/pe/mpich/8.1.23/ofi/amd/5.0/include -I/opt/cray/pe/parallel-netcdf/1.12.3.1/amd/4.3/include -I/lustre/orion/cli115/proj-shared/testing/S/J/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/SourceMods/src.eam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/crm -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/crm/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/pp_none -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/bulk_aero -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/aerosol -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/utils -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/cam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/p3/eam -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/dynamics/se -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/share -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/theta-l -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/theta-l/share -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/homme/src/share/compose -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/cpl -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/control -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/utils -I/lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/lnd/obj -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/src -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/src/extensions -I/lustre/orion/cli115/proj-shared/testing/E3SM/externals/YAKL/external -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/cmake/atm/../../../externals/YAKL -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/. -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rte -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rte/kernels -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp/kernels -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rte -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rrtmgp -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/cloud_optics -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/fluxes_byband -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples -I/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples/all-sky -isystem /lustre/orion/cli115/proj-shared/testing/S/ERS_Ln9.ne4pg2_ne4pg2.FRCE-MMF1.crusher_amdclang.eam-cosp_nhtfrq9.JNI230514_001955/bld/amdclang/mpich/nodebug/nothreads/mct/include -c /lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90 -o CMakeFiles/atm.dir/__/__/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90.o -I/opt/cray/pe/mpich/8.1.23/ofi/amd/5.0/include -I/opt/cray/pe/libsci/22.12.1.1/AMD/4.0/x86_64/include -I/opt/cray/pe/dsmml/0.2.2/dsmml//include -I/opt/cray/pe/pmi/6.1.8/include -I/opt/cray/xpmem/2.5.2-2.4_3.45__gd0f7936.shasta/include
F90-S-0034-Syntax error at or near integer constant 4 (/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90: 573)
F90-S-0034-Syntax error at or near = (/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90: 846)
F90-S-0034-Syntax error at or near = (/lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90: 849)
  0 inform,   0 warnings,   3 severes, 0 fatal for ica_nr

The source lines in question.

 573       integer  I,K,L,LL,N,NC, L1,L2,L3,  LCLTOP,LCIRRUS 
 846         NC = 1
 849             NC = NC+1

Additional context for source. A common pattern is the presence of CBIN_ around the site of the errors.

 572       integer, dimension(CBIN_) :: NSAME
 573       integer  I,K,L,LL,N,NC, L1,L2,L3,  LCLTOP,LCIRRUS
...
 845         GFNR(N,1) = CBIN_
 846         NC = 1
 847         do I = CBIN_-1,1,-1
 848           if(NSAME(I) .gt. 0) then
 849             NC = NC+1
 850             GFNR(N,NC) = I
 851           endif
 852         enddo
sarats commented 1 year ago

Ref: Source file in master: https://github.com/E3SM-Project/E3SM/blob/master/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90#L573

sarats commented 1 year ago

Removing includes etc, following is able to reproduce the error.

amdflang -march=znver3 -DBIT64 -DCAM -DCNL -DCO2A -DCPRAMD -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DHOMME_ENABLE_COMPOSE -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DMODEL_THETA_L -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=12 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPMD -DUSE_COSP -DYES3DVAL=0 -D_MPI -D_PNETCDF -D_PRIM -Dsam1mom -c /lustre/orion/cli115/proj-shared/testing/E3SM/components/eam/src/chemistry/mozart/UCI_cld_sub_mod.f90
liho745 commented 1 year ago

@rljacob it is not a MOSART/river issue, but EAM/atmosphere issue?

rljacob commented 1 year ago

Sorry. I got mozart chemistry confused with mosart river.

tangq commented 1 year ago

@sarats , the line 573 in question seems okay to me. integer I,K,L,LL,N,NC, L1,L2,L3, LCLTOP,LCIRRUS

I am not sure why AMD compiler complains about the syntax. But no errors on the line 570 above: integer :: NRGX, NICAX

I don't have access to Frontier to test it. Can you try if this works (just a guess)? integer :: I,K,L,LL,N,NC, L1,L2,L3, LCLTOP,LCIRRUS

sarats commented 1 year ago

I tried that already. It looks like a compiler bug - we will submit to OLCF. cc @grnydawn.

sarats commented 1 year ago

@tangq I'm trying to put together a standalone reproducer. Can you point to where CBIN_ is defined? Is it passed as an argument or defined elsewhere?

sarats commented 1 year ago

Never mind, found https://github.com/E3SM-Project/E3SM/blob/master/components/eam/src/chemistry/mozart/UCI_fjx_cmn_mod.F90#LL246 integer, parameter :: CBIN_ = 10 ! # of quantized cloud fration bins

tangq commented 1 year ago

Great that you found it already.

sarats commented 1 year ago

Submitted to OLCF, ref OLCFHELP-12499.

sarats commented 1 year ago

Last response from OLCF:

I've been told that amdflang is currently not actively getting support as AMD is working on a new compiler to replace it. So I am doubtful we'll get a fix for this. So if you're able to use the Cray or GNU compilers, then our recommendation is to stick to those for Fortran.

FWIW, we always knew that AMD's Fortran is flaky. There are currently 62 build errors and 67 test failures on Crusher. Even recently, we had as low as 10 build errors and 15 total test fails. https://my.cdash.org/build/2340290 image

Anyway, we should probably pause the automated nightly tests with amdflang on Crusher. We can revisit once we identify any workaround.

whannah1 commented 1 year ago

@sarats I just stumbled on the same problem using GNU on Summit and I think it's related to the "-DNC=4" that is added for the SE build, which seems to replacing the "NC" variable in the UCI chemistry code...? However, this only happens with the new MMF I'm working on integrating and not the "main" MMF or the non-MMF configurations. So I'm very confused about this... All cases define "-DNC=4" and all cases build the UCI files...

I can't supply a reproducer at the moment because this is only happening in a temporary branch that I'm using to test a merge with the current master branch into my PAM development branch.

oksanaguba commented 1 year ago

IIRC NC was used for a previous implementation of SL transport in homme. It is still in a lot of homme files, though i did not look closely.

ambrad commented 1 year ago

@oksanaguba would you clarify how this is related to SL transport? I don't see the NC symbol at all in Homme. Homme of course uses NP extensively, independently of transport method and other details.

Edit: Are you referring to that ancient, pre-COMPOSE, version from before my time? It is interesting that eam/bld/configure has this:

if ($dyn_pkg eq 'se') {
    my $csnp = $cfg_ref->get('csnp');
    $cfg_cppdefs .= " -DNP=$csnp -DNC=4 -DHAVE_F2003_PTR_BND_REMAP";

and yet Homme doesn't use NC.

oksanaguba commented 1 year ago

yes, i meant the old FV scheme from maybe CSLAM and/or C. Erath's work. I agree that EAM build of homme probably does not need NC, but NC is in a lot of standalone files, and also in

src/theta-l/config.h.cmake.in:#define NC @NUM_CELLS@

Maybe simply removing it from eam config would work, or not.

ambrad commented 1 year ago

I think I see the issue. First, for cleanliness, we definitely should not define NC in bld/configure. We also ought to make NP something like HOMME_NP.

But, second, the AMD-related issue is because the compiler is not differentiating between f90 and F90. The first by convention does not trigger the macro preprocessor, while the second does. The chemistry files in question use f90, so with most compilers, NC use is safe. With the AMD one, evidently it's not.

Here's a reproducer with gfortran to explain what I mean:

$ cat define.F90
program main
  integer :: NC, NP
  NC = 1
  NP = 3
  print *, NC, NP
end program main
$ gfortran -DNC=4 define.F90
define.F90:2:14:

   integer :: NC, NP
              1
Error: Invalid character in name at (1)
define.F90:3:5:

   NC = 1
     1
Error: Invalid character in name at (1)
$ cp define.F90 define.f90 
$ gfortran -DNC=4 define.f90
$ ./a.out
           1           3
whannah1 commented 1 year ago

Great find @ambrad! This is consistent with all the symptoms I've been seeing with GNU.

There must be flags to make these compilers distinguish between f90 and F90, right?

rljacob commented 1 year ago

There is probably a flag to force it to run the Fortran internal preprocessor (which should be used instead of cpp anyway).

ambrad commented 1 year ago

I'm testing e3sm_developer with -DNC=4 removed and will report back.

Update: Looks good, so I'll make a PR.