E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
352 stars 364 forks source link

Cray Fortran: incorrect index return value with -h zero (known workaround, Cray ticket open) #5012

Closed xyuan closed 2 years ago

xyuan commented 2 years ago

There is CRAY Fortran compiler build issue, that is associated with the use of return in a fortran function, see below.

the example code can be found https://github.com/E3SM-Project/E3SM/blob/master/components/eam/src/physics/rrtmgp/radconstants.F90

character(len=gasnamelength), public, parameter :: gaslist(nradgas) & = (/'H2O ','O3 ', 'O2 ', 'CO2 ', 'N2O ', 'CH4 ', 'CFC11', 'CFC12'/)

integer function rad_gas_index(gasname)

! return the index in the gaslist array of the specified gasname

character(len=*),intent(in) :: gasname integer :: igas

rad_gas_index = -1 do igas = 1, nradgas if (trim(gaslist(igas)).eq.trim(gasname)) then rad_gas_index = igas return endif enddo call endrun ("rad_gas_index: can not find gas with name "//gasname) end function rad_gas_index

for any gasname as input, the returned rad_gas_index is 0, however it should be index=4 for index=rad_gas_index("CO2"). This bug affects many code in E3SM, and hard to work around all of the function code, so I strongly recommend the CRAY Fortran compiler to support this feature.

keziming commented 2 years ago

Hi @xyuan I am curious about that if this problem also affects other gases?

@keziming

xyuan commented 2 years ago

Hi @xyuan I am curious about that if this problem also affects other gases?

@keziming

yeah, it affects all the gases

sarats commented 2 years ago

cc @twhite-cray @abbotts @mattdturner

xyuan commented 2 years ago

The modules loaded are:

Currently Loaded Modules: 1) craype/2.7.15 5) xalt/1.3.0 9) craype-accel-amd-gfx90a 13) subversion/1.14.0 17) cray-libsci/21.08.1.2 2) cray-dsmml/0.2.2 6) DefApps/default 10) rocm/4.5.2 14) git/2.31.1 18) cray-hdf5-parallel/1.12.0.7 3) PrgEnv-cray/8.3.3 7) libfabric/1.15.0.0 11) cray-mpich/8.1.16 15) cmake/3.22.2 19) cray-netcdf-hdf5parallel/4.7.4.7 4) cce/14.0.0 8) craype-network-ofi 12) cray-python/3.9.4.2 16) zlib/1.2.11 20) cray-parallel-netcdf/1.12.1.7

xyuan commented 2 years ago

and the command is: cd /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm && python3 /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/Tools/e3sm_compile_wrap.py /opt/cray/pe/craype/2.7.15/bin/ftn -DBIT64 -DCAM -DCNL -DCO2A -DCPRCRAY -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=9 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPDLOG_COMPILED_LIB -DSPMD -DYES3DVAL=0 -D_MPDATA -D_MPI -D_PNETCDF -D_PRIM -D__HIP_ROCclr -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm/yakl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/. -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.7.4.7/crayclang/10.0/include -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/cray/pe/parallel-netcdf/1.12.1.7/crayclang/10.0/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/SourceMods/src.eam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/pp_none -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/bulk_aero -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/aerosol -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/mozart -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/utils -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/cam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/cpl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/control -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/utils -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/lnd/obj -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/gptl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/hipCUB/hipcub/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/rocPRIM/rocprim/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src/ekat/ekat_f90_modules -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/yaml-cpp/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/spdlog/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/p3/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/shoc/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/../../../externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/. -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/cloud_optics -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/fluxes_byband -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples/all-sky -em -J. -cpp -s default32 -eZ -O2 -h noacc -h zero -hfp0 -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/rocm-4.5.2/include -f free -N 255 -h byteswapio -em -M1077 -DUSE_CONTIGUOUS=contiguous, -c /gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se/dyn_grid.F90 -o CMakeFiles/atm.dir//__/eam/src/dynamics/se/dyn_grid.F90.o

mattdturner commented 2 years ago

@xyuan Thank you for the information about modules and compilation flags, as well as the reproducer you provided.

I was able to replicate this issue on Crusher, and narrowed it down to the use of -h zero. Without -h zero, the correct indices are returned from the function.

This is a bug in the Cray Fortran compiler, and an internal ticket has been opened.

In my testing, it looks like the correct results are given if using Cray Fortran version 13.0.2, and the incorrect ones when using Cray Fortran version 14.0.0.

Workaround While not ideal, I was able to determine a workaround of adding a write statement prior to the return. For example, with this loop

  rad_gas_index = -1
  do igas = 1, 8
    write(*,*) 'checking igas = ', igas
    if (trim(gaslist(igas)).eq.trim(gasname)) then
      rad_gas_index = igas
      return
    endif
  enddo

the results are 0 with Cray Fortran version 14.0.0

> ftn --version
Cray Fortran : Version 14.0.0
> ftn -h zero main.F90
> ./a.out
 CH4 integer =  0
 O3 integer =  0
 CFC12 integer =  0

If I change the loop to

  rad_gas_index = -1
  do igas = 1, 8
    write(*,*) 'checking igas = ', igas
    if (trim(gaslist(igas)).eq.trim(gasname)) then
      rad_gas_index = igas
      write(*,*) ''
      return
    endif
  enddo

then I get the correct results:

> ftn -h zero main.F90
> ./a.out

 CH4 integer =  6

 O3 integer =  2

 CFC12 integer =  8
mattdturner commented 2 years ago

Another workaround is to compile the impacted files (e.g., radconstants.F90) with -O0, or keep the current flags but add -hipa0 to the options for the impacted files.

That could really hurt performance, though, depending on what other routines are in the impacted files.

xyuan commented 2 years ago

@mattdturner Thanks very much, let me implement the workaround and try a case on crusher

sarats commented 2 years ago

Workaround fixed the issue interim. Waiting on a Cray compiler fix for the root cause and then we can close this.

abbotts commented 2 years ago

@mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.

xyuan commented 2 years ago

Yes, please close this issue. Thanks for your help on this issue.

From: Steve Abbott @.> Date: Tuesday, September 6, 2022 at 1:41 PM To: E3SM-Project/E3SM @.> Cc: Yuan, Xingqiu @.>, Mention @.> Subject: Re: [E3SM-Project/E3SM] Cray Fortran: incorrect index return value with -h zero (known workaround, Cray ticket open) (Issue #5012)

@mattdturnerhttps://github.com/mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.

— Reply to this email directly, view it on GitHubhttps://github.com/E3SM-Project/E3SM/issues/5012#issuecomment-1238465414, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFCG2CRLLUNLA2IRSP5WRTV4562ZANCNFSM5X5FYVXA. You are receiving this because you were mentioned.Message ID: @.***>