Closed nmizukami closed 8 months ago
Hi Erik (@ekluzek), using pioSystem
as a pointer did not work. It compiled with both gnu and intel, but did not run for both. No obvious messages (I think it died when it tried to create the file, which uses pioSystem
). I am thinking it is the best to leave this as is (so leave pioSystem
defined as pio_system type, not pointer) and figure out what to do later since this works both stand-alone and cesm coupling now. or any other idea?
Hi Erik (@ekluzek), using
pioSystem
as a pointer did not work. It compiled with both gnu and intel, but did not run for both. No obvious messages (I think it died when it tried to create the file, which usespioSystem
). I am thinking it is the best to leave this as is (so leavepioSystem
defined as pio_system type, not pointer) and figure out what to do later since this works both stand-alone and cesm coupling now. or any other idea?
@nmizukami yeah if it works I guess we can leave it as is. I'm nervous about it because the CESM interface is returning a pointer and we aren't treating it that way. But, if it works, I guess it's OK. Something to check though, is to make sure it works for the compilers we want it to. And I'm also concerned that maybe the reason it works happens to be because of the internals of either PIO or the compilers we use -- so that if either changes -- it'll break. But, we can also handle that when it comes rather than now, especially since this is problematic. I'd also like to show this to Jim Edwards and see if he has thoughts on this. So we should schedule a meeting for that.
@nmizukami I ran the tests on cheyenne and they were all fine running as expected. The tests on izumi had some tests pass, but the NAG tests showed problems...
ERS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default RUN SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default RUN SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default RUN
With DEBUG mode turned on in above, you'll see things like dangling pointers caught for example in that middle one.
Since, the last one is the simplest this is what you see captured in the traceback for cesm.log:
Completion(send) value=1072724297 tag=1
Completion(send) value=977411098 tag=1
Completion(send) value=1199980800 tag=1
Completion(sRuntime Error: /fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/src/historyFile.f90, line 407: Reference to dangling pointer THIS%IODESCHRUFLUX
Target was RETURNed from procedure HISTORYFILE:SET_COMPDOF_RCH_HRU
Program terminated by fatal error
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/src/historyFile.f90, line 407: Error occurred in HISTORYFILE:WRITE_FLUX_HRU
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/src/historyFile.f90, line 372: Called by HISTORYFILE:WRITE_FLUX
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/src/write_simoutput_pio.f90, line 210: Called by WRITE_SIMOUTPUT_PIO:OUTPUT
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/cpl/RtmMod.F90, line 676: Called by RTMMOD:ROUTE_RUN
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/mizuRoute/route/build/cpl/nuopc/rof_comp_nuopc.F90, line 792: Called by ROF_COMP_NUOPC:MODELADVANCE
/fs/cgd/data0/erik/ctsm_worktree/mizuroute/components/cmeps/cime_config/../cesm/driver/esmApp.F90, line 141: Called by ESMAPP
end) value=1124738608 tag=1
Thank you Erik.
So I believe this is nmizukami:cesm-coupling_pio-decomp branch (make sure I am looking at the same code)?
Dangling pointer means that a target is deallocated (or does not exist for some reason)? I wonder if I actually need to clean (nullify the pointer) when a history netcdf is finished and closed before new history file (= before histFile object is created) like this:
SUBROUTINE cleanup_rch_hru(this)
implicit none
class(histFile), intent(inout) :: this
nullify(this%ioDescRchFlux)
nullify(this%ioDescHruFlux)
END SUBROUTINE cleanup_rch_hru
Other than that, I don't see apparent error in the code.
Hi @nmizukami. I looked at this some more to see a little bit of what's happening. I was going to try your idea of nullifying pointers, but I realized that this isn't happening at the end -- it's fairly early on. But, it's also after other successful output happens.
Here's a code snippet where this is happening...
call write_netcdf(this%pioFileDesc, 'time', [hVars_local%timeVar(1)], [this%iTime], [1], ierr, cmessage)
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
call write_netcdf(this%pioFileDesc, 'time_bounds', hVars_local%timeVar, [1,this%iTime], [2,1], ierr, cmessage)
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
if (.not.this%gageOutput) then
if (meta_hflx(ixHFLX%basRunoff)%varFile) then
call this%write_flux_hru(hVars_local, ierr, cmessage). !!! <<<< dying in here
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
endif
end if
call this%write_flux_rch(hVars_local, index_write, ierr, cmessage)
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
call sync_file(this%pioFileDesc, ierr, cmessage)
if(ierr/=0)then; message=trim(message)//trim(cmessage); return; endif
END SUBROUTINE write_flux
Here's my casedir on izumi...
/scratch/cluster/erik/tests_mizu_c-cpln3_v211_ctsm51d114m/SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default.GC.mizu_c-cpln3_v211_ctsm51d114m_nag
Ok, so nullify the pointer did not work. So need to think something else... do you think something like this is related to this error? I was googling with "fortran dangling pointer" and ran into this.
Hi Erik (@ekluzek), using
pioSystem
as a pointer did not work. It compiled with both gnu and intel, but did not run for both. No obvious messages (I think it died when it tried to create the file, which usespioSystem
). I am thinking it is the best to leave this as is (so leavepioSystem
defined as pio_system type, not pointer) and figure out what to do later since this works both stand-alone and cesm coupling now. or any other idea?@nmizukami yeah if it works I guess we can leave it as is. I'm nervous about it because the CESM interface is returning a pointer and we aren't treating it that way. But, if it works, I guess it's OK. Something to check though, is to make sure it works for the compilers we want it to. And I'm also concerned that maybe the reason it works happens to be because of the internals of either PIO or the compilers we use -- so that if either changes -- it'll break. But, we can also handle that when it comes rather than now, especially since this is problematic. I'd also like to show this to Jim Edwards and see if he has thoughts on this. So we should schedule a meeting for that.
Hi Erik, I change pioSystem
to a pointer in globalData.f90. As Jim said, allocating pioSystem
before calling pio_sys_init
works for standalone. cesm-coupling mode also works (I just tested one case - f09_f09_rMERIT with cheyenne intel.
Hi Erik, I updated handling of the pio decomposition data for history file output. At least there are no pointers (except for pioSystem), so will need to test with nag compiler in izumi. At cheyenne, it works as expected.
@nmizukami I ran that one test on izumi, and it PASSes! So I'll rerun all the testing which hopefully will all work, and then we can merge this and make a tag!
@nmizukami I ran that one test on izumi, and it PASSes! So I'll rerun all the testing which hopefully will all work, and then we can merge this and make a tag!
Ah, great! thank you for testing! Yes, we can merge and tag it.
I also want to setup and run ctsm-mizuroute in izumi (something I want to talk about today).
Tests on izumi all PASS, the Cheyenne tests are going:
ERI_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_gnu.mizuroute-default ERS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default PET_Mmpi-serial_P1x25.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_intel.mizuroute-default SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_intel.mizuroute-default SMS_D.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_gnu.mizuroute-default SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_intel.mizuroute-default SMS_D_Mmpi-serial.5x5_amazon_r05.I2000Clm50SpMizGs.izumi_nag.mizuroute-default
OK, tests on Cheyenne are as expected. So this is ready to merge and create a tag.
Move pio decomposition initialization to the model initialization (beginning of the model execution). Idea is to define decomposition data once and share pio system (which decomposition data is attached to) for all the history files and restart file, and avoid defining decomposition data every time the output files are created.
This does not change any science and should re-produce the results from the codes prior to this PR.
resolves #409 resolves #438 resolves #439