Open jinyuntang opened 4 years ago
I don't know how to fix it. Someone knows RD BGC should be assigned to work on this.
@jinyuntang - I'll take a look at this. Is it with the current version of master?
@jonbob, I think it may still occur with the current master, unless someone fixed it accidentally from other changes.
@jinyuntang - do you remember which restart file it has trouble reading? I was expecting it to be one of the MPAS components but my first test hung up reading the ELM restart...
@bishtgautam - it looks like it's failing on the ELM finidat file: 180227_cplhist_BGCspinup_ne30_ne30_ICB1850CNPRDCTCBC.clm2.r.0701-01-01-00000.nc
When that file is specified, the code dies with the following traceback:
2613: NetCDF: Variable not found
2613: NetCDF: Invalid dimension ID or name
2613: pio_support::pio_die:: myrank= -1 : ERROR: nf_mod.F90: 1293 :
2613: NetCDF: Invalid dimension ID or name
2613: Image PC Routine Line Source
2613: e3sm.exe 00000000057CF4C6 Unknown Unknown Unknown
2613: e3sm.exe 0000000003C30C7A pio_support_mp_pi 120 pio_support.F90
2613: e3sm.exe 0000000003C2E4E4 pio_utils_mp_chec 74 pio_utils.F90
2613: e3sm.exe 0000000003C11270 nf_mod_mp_pio_inq 1293 nf_mod.F90
2613: e3sm.exe 0000000002A2F18F ncdio_pio_mp_chec 354 ncdio_pio.F90.in
2613: e3sm.exe 00000000022A9BA8 restfilemod_mp_re 1224 restFileMod.F90
2613: e3sm.exe 00000000022A6BA0 restfilemod_mp_re 588 restFileMod.F90
2613: e3sm.exe 00000000021780A6 clm_initializemod 722 clm_initializeMod.F90
2613: e3sm.exe 00000000021629E3 lnd_comp_mct_mp_l 281 lnd_comp_mct.F90
2613: e3sm.exe 000000000042490F component_mod_mp_ 257 component_mod.F90
If I set finidat to ' ' in user_nl_clm, it successfully completes a five-day smoke test
@jonbob Can you point me the run directory of your test? I want to check the lnd.log file.
@bishtgautam -- sure. The debug one is on anvil, at:
/lcrc/group/acme/jwolfe/acme_scratch/anvil/SMS_D.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850.anvil_intel.20191204_145253_susqyx
I resubmitted the non-debug one (also on anvil) at:
/lcrc/group/acme/jwolfe/acme_scratch/anvil/SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850.anvil_intel.20191204_164317_s8m7yq
though it's wallowing in the queue. There is currently a successful run in that directory when I set finidat to ' '. I have another test on cori I could point you at but it's down for maintenance...
@jonbob On Compy, I had to set PIO_TYPENAME_<ATM/CPL/LND>=netcdf
along with the code mod in 50001af3bcfccd5b77ea1e864ecf61893f6a83b3 to get the test to run successfully.
Test SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850 fail with ERROR: ERROR mpas-o buildnml: ocn bgc cannot be run with more than 1 thread. When run step by step using --no-setup, after the thread thing is fixed, the simulation fail with error in reading restart file.