Closed ndkeen closed 2 weeks ago
If I let it write core files, it tells me:
#0 0x000000000330e2f0 in phys_grid_ctem::phys_grid_ctem_reg () at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/components/eam/src/physics/cam/phys_grid_ctem.F90:137
#1 0x0000000003761032 in inital::cam_initial (dyn_in=..., dyn_out=..., nlfilename=...) at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/components/eam/src/dynamics/se/inital.F90:47
#2 0x00000000027fb070 in cam_comp::cam_init (cam_out=0x0, cam_in=0x0, stop_ymd=10106, stop_tod=0) at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/components/eam/src/control/cam_comp.F90:162
#3 0x00000000027dd6bc in atm_comp_mct::atm_init_mct (eclock=..., cdata_a=..., x2a_a=..., a2x_a=..., nlfilename=...) at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/components/eam/src/cpl/atm_comp_mct.F90:369
#4 0x0000000000936996 in component_mod::component_init_cc (eclock=..., comp=..., comp_init=-443987883, infodata=..., nlfilename=..., seq_flds_x2c_fluxes=..., seq_flds_c2x_fluxes=...) at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/driver-mct/main/component_mod.F90:257
#5 0x00000000008fcaba in cime_comp_mod::cime_init () at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/driver-mct/main/cime_comp_mod.F90:1488
#6 0x000000000093394f in cime_driver () at /global/cfs/cdirs/e3sm/ndk/repos/me25-apr15/driver-mct/main/cime_driver.F90:122
I was debugging this more. In the newly added subroutine, we see:
subroutine phys_grid_ctem_reg
!...
real(r8) :: zalats(nzalat)
!....
if (.not. do_tem_diags) return
!... zalats array actually used
where the value of nzalat is initialized to -huge(1)
, so that this routine given that value to auto-allocate zalats even if it quickly returns.
Even though this is a bit awkward, it should still be ok as standard says it will get 0 size. Other compilers OK with this. I then see that with nvidia, we set -Mstack_arrays
. Without this flag, the failing cases are able to run. So it is likely a bug in compiler. One pretty easy work-around is to disable that flag for the one fortran unit. As E3SM cannot remove flags conditionally, only add them, I can add -Mnostack_arrays
. Other work-arounds might be to only call these routines when do_tem_diags=.true.
or set the initial value of nzalat to be something like 0
.
I also created a reproducer.
MODULE shr_kind_mod
public
integer,parameter :: SHR_KIND_R8 = selected_real_kind(12) ! 8 byte real
integer,parameter :: SHR_KIND_R4 = selected_real_kind( 6) ! 4 byte real
integer,parameter :: SHR_KIND_RN = kind(1.0) ! native real
integer,parameter :: SHR_KIND_I8 = selected_int_kind (13) ! 8 byte integer
integer,parameter :: SHR_KIND_I4 = selected_int_kind ( 6) ! 4 byte integer
integer,parameter :: SHR_KIND_IN = kind(1) ! native integer
END MODULE shr_kind_mod
module phys_grid_ctem
use shr_kind_mod, only : r8 => shr_kind_r8
implicit none
private
integer :: nzalat = -huge(1)
logical :: do_tem_diags = .false.
public :: phys_grid_ctem_reg
contains
subroutine phys_grid_ctem_reg
real(r8) :: zalats(nzalat)
real(r8) :: z1(nzalat), z2(nzalat), z3(nzalat), z4(nzalat), z5(nzalat)
integer :: j
print*, "nzalat=", nzalat
print*, "size(zalats)", size(zalats)
if (.not. do_tem_diags) return
! actually use zalats
do j = 1,nzalat
zalats(j) = 1.0+zalats(j)
enddo
end subroutine phys_grid_ctem_reg
end module phys_grid_ctem
program boop
use phys_grid_ctem
implicit none
call phys_grid_ctem_reg
print*, "Done"
end program boop
Using following should fail:
nvfortran -i4 -Mstack_arrays -Mextend -byteswapio -Mflushz -Kieee -Mallocatable=03 -traceback -O0 -g -Ktrap=fp -Mbounds -Kieee -Mfree arrayallocate-oddvalue.f90
We had some build errors in last week or so with nvidia compiler that was corrected (https://github.com/E3SM-Project/E3SM/issues/6332), but now we see runtime errors in init with several tests on pm-cpu. For example:
Checking out different hashes, I see this issue started happening after https://github.com/E3SM-Project/E3SM/pull/6311
The error is not very useful. Even with DEBUG:
I also tried with default version of nvidia compiler and still see same issue. Currently we have 22.7 and I just tried with 23.9