Closed xyuan closed 2 years ago
Hi @xyuan I am curious about that if this problem also affects other gases?
@keziming
Hi @xyuan I am curious about that if this problem also affects other gases?
@keziming
yeah, it affects all the gases
cc @twhite-cray @abbotts @mattdturner
The modules loaded are:
Currently Loaded Modules: 1) craype/2.7.15 5) xalt/1.3.0 9) craype-accel-amd-gfx90a 13) subversion/1.14.0 17) cray-libsci/21.08.1.2 2) cray-dsmml/0.2.2 6) DefApps/default 10) rocm/4.5.2 14) git/2.31.1 18) cray-hdf5-parallel/1.12.0.7 3) PrgEnv-cray/8.3.3 7) libfabric/1.15.0.0 11) cray-mpich/8.1.16 15) cmake/3.22.2 19) cray-netcdf-hdf5parallel/4.7.4.7 4) cce/14.0.0 8) craype-network-ofi 12) cray-python/3.9.4.2 16) zlib/1.2.11 20) cray-parallel-netcdf/1.12.1.7
and the command is: cd /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm && python3 /gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/Tools/e3sm_compile_wrap.py /opt/cray/pe/craype/2.7.15/bin/ftn -DBIT64 -DCAM -DCNL -DCO2A -DCPRCRAY -DCRM_DT=10 -DCRM_DX=2000 -DCRM_NX=64 -DCRM_NX_RAD=4 -DCRM_NY=1 -DCRM_NY_RAD=1 -DCRM_NZ=50 -DFORTRANUNDERSCORE -DHAVE_COMM_F2C -DHAVE_F2003_PTR_BND_REMAP -DHAVE_GETTIMEOFDAY -DHAVE_MPI -DHAVE_NANOTIME -DHAVE_SLASHPROC -DHAVE_TIMES -DHAVE_VPRINTF -DLINUX -DLSMLAT=1 -DLSMLON=1 -DMAXPATCH_PFT=numpft+1 -DMCT_INTERFACE -DMMF_SAMXX -DNC=4 -DNDEBUG -DNO_R16 -DNP=4 -DNPG=2 -DN_RAD_CNST=30 -DPCNST=9 -DPCOLS=16 -DPLAT=1 -DPLEV=60 -DPLON=384 -DPSUBCOLS=1 -DPTRK=1 -DPTRM=1 -DPTRN=1 -DSPDLOG_COMPILED_LIB -DSPMD -DYES3DVAL=0 -D_MPDATA -D_MPI -D_PNETCDF -D_PRIM -D__HIP_ROCclr -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/cmake/atm/yakl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/. -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/crayclanggpu/mpich/nodebug/nothreads/mct/mct/noesmf/c1a1l1i1o1r1g1w1i1e1/include -I/opt/cray/pe/netcdf-hdf5parallel/4.7.4.7/crayclang/10.0/include -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/cray/pe/parallel-netcdf/1.12.1.7/crayclang/10.0/include -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/SourceMods/src.eam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/pp_none -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/bulk_aero -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/aerosol -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/mozart -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/chemistry/utils -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/cam -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/homme/src/preqx/share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/cpl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/control -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/utils -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/lnd/obj -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/gptl -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/hipCUB/hipcub/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/YAKL/rocPRIM/rocprim/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/ekat/src/ekat/ekat_f90_modules -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/core/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/containers/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/externals/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/kokkos/algorithms/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/yaml-cpp/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/externals/ekat/extern/spdlog/include -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/p3/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.crayclanggpu.1x1/bld/cmake-bld/scream/src -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/shoc/../share -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/cmake/atm/../../../externals/YAKL -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/. -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rte/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/external/cpp/rrtmgp/kernels -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rte -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/rrtmgp -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/cloud_optics -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/extensions/fluxes_byband -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples -I/gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/physics/rrtmgp/cpp/../external/cpp/examples/all-sky -em -J. -cpp -s default32 -eZ -O2 -h noacc -h zero -hfp0 -I/opt/cray/pe/mpich/8.1.16/ofi/crayclang/10.0/include -I/opt/rocm-4.5.2/include -f free -N 255 -h byteswapio -em -M1077 -DUSE_CONTIGUOUS=contiguous, -c /gpfs/alpine/cli115/scratch/yuanx/e3sm_p3_crusher/components/eam/src/dynamics/se/dyn_grid.F90 -o CMakeFiles/atm.dir//__/eam/src/dynamics/se/dyn_grid.F90.o
@xyuan Thank you for the information about modules and compilation flags, as well as the reproducer you provided.
I was able to replicate this issue on Crusher, and narrowed it down to the use of -h zero
. Without -h zero
, the correct indices are returned from the function.
This is a bug in the Cray Fortran compiler, and an internal ticket has been opened.
In my testing, it looks like the correct results are given if using Cray Fortran version 13.0.2, and the incorrect ones when using Cray Fortran version 14.0.0.
Workaround
While not ideal, I was able to determine a workaround of adding a write
statement prior to the return
. For example, with this loop
rad_gas_index = -1
do igas = 1, 8
write(*,*) 'checking igas = ', igas
if (trim(gaslist(igas)).eq.trim(gasname)) then
rad_gas_index = igas
return
endif
enddo
the results are 0 with Cray Fortran version 14.0.0
> ftn --version
Cray Fortran : Version 14.0.0
> ftn -h zero main.F90
> ./a.out
CH4 integer = 0
O3 integer = 0
CFC12 integer = 0
If I change the loop to
rad_gas_index = -1
do igas = 1, 8
write(*,*) 'checking igas = ', igas
if (trim(gaslist(igas)).eq.trim(gasname)) then
rad_gas_index = igas
write(*,*) ''
return
endif
enddo
then I get the correct results:
> ftn -h zero main.F90
> ./a.out
CH4 integer = 6
O3 integer = 2
CFC12 integer = 8
Another workaround is to compile the impacted files (e.g., radconstants.F90
) with -O0
, or keep the current flags but add -hipa0
to the options for the impacted files.
That could really hurt performance, though, depending on what other routines are in the impacted files.
@mattdturner Thanks very much, let me implement the workaround and try a case on crusher
Workaround fixed the issue interim. Waiting on a Cray compiler fix for the root cause and then we can close this.
@mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.
Yes, please close this issue. Thanks for your help on this issue.
From: Steve Abbott @.> Date: Tuesday, September 6, 2022 at 1:41 PM To: E3SM-Project/E3SM @.> Cc: Yuan, Xingqiu @.>, Mention @.> Subject: Re: [E3SM-Project/E3SM] Cray Fortran: incorrect index return value with -h zero (known workaround, Cray ticket open) (Issue #5012)
@mattdturnerhttps://github.com/mattdturner 's reproducer is fixed in CCE 14.0.2 and CCE 14.0.3. Hopefully we can confirm it fixes the real code too, then close this.
— Reply to this email directly, view it on GitHubhttps://github.com/E3SM-Project/E3SM/issues/5012#issuecomment-1238465414, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAFCG2CRLLUNLA2IRSP5WRTV4562ZANCNFSM5X5FYVXA. You are receiving this because you were mentioned.Message ID: @.***>
There is CRAY Fortran compiler build issue, that is associated with the use of return in a fortran function, see below.
the example code can be found https://github.com/E3SM-Project/E3SM/blob/master/components/eam/src/physics/rrtmgp/radconstants.F90