E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
348 stars 359 forks source link

Test SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850 fail on cori #3272

Open jinyuntang opened 4 years ago

jinyuntang commented 4 years ago

Test SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850 fail with ERROR: ERROR mpas-o buildnml: ocn bgc cannot be run with more than 1 thread. When run step by step using --no-setup, after the thread thing is fixed, the simulation fail with error in reading restart file.

jinyuntang commented 4 years ago

I don't know how to fix it. Someone knows RD BGC should be assigned to work on this.

jonbob commented 4 years ago

@jinyuntang - I'll take a look at this. Is it with the current version of master?

jinyuntang commented 4 years ago

@jonbob, I think it may still occur with the current master, unless someone fixed it accidentally from other changes.

jonbob commented 4 years ago

@jinyuntang - do you remember which restart file it has trouble reading? I was expecting it to be one of the MPAS components but my first test hung up reading the ELM restart...

jonbob commented 4 years ago

@bishtgautam - it looks like it's failing on the ELM finidat file: 180227_cplhist_BGCspinup_ne30_ne30_ICB1850CNPRDCTCBC.clm2.r.0701-01-01-00000.nc When that file is specified, the code dies with the following traceback:

2613:  NetCDF: Variable not found
2613:  NetCDF: Invalid dimension ID or name
2613:  pio_support::pio_die:: myrank=          -1 : ERROR: nf_mod.F90:        1293 :
2613:  NetCDF: Invalid dimension ID or name
2613: Image              PC                Routine            Line        Source
2613: e3sm.exe           00000000057CF4C6  Unknown               Unknown  Unknown
2613: e3sm.exe           0000000003C30C7A  pio_support_mp_pi         120  pio_support.F90
2613: e3sm.exe           0000000003C2E4E4  pio_utils_mp_chec          74  pio_utils.F90
2613: e3sm.exe           0000000003C11270  nf_mod_mp_pio_inq        1293  nf_mod.F90
2613: e3sm.exe           0000000002A2F18F  ncdio_pio_mp_chec         354  ncdio_pio.F90.in
2613: e3sm.exe           00000000022A9BA8  restfilemod_mp_re        1224  restFileMod.F90
2613: e3sm.exe           00000000022A6BA0  restfilemod_mp_re         588  restFileMod.F90
2613: e3sm.exe           00000000021780A6  clm_initializemod         722  clm_initializeMod.F90
2613: e3sm.exe           00000000021629E3  lnd_comp_mct_mp_l         281  lnd_comp_mct.F90
2613: e3sm.exe           000000000042490F  component_mod_mp_         257  component_mod.F90

If I set finidat to ' ' in user_nl_clm, it successfully completes a five-day smoke test

bishtgautam commented 4 years ago

@jonbob Can you point me the run directory of your test? I want to check the lnd.log file.

jonbob commented 4 years ago

@bishtgautam -- sure. The debug one is on anvil, at:

/lcrc/group/acme/jwolfe/acme_scratch/anvil/SMS_D.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850.anvil_intel.20191204_145253_susqyx

I resubmitted the non-debug one (also on anvil) at:

/lcrc/group/acme/jwolfe/acme_scratch/anvil/SMS.ne30_oECv3.BGCEXP_BCRC_CNPRDCTC_1850.anvil_intel.20191204_164317_s8m7yq

though it's wallowing in the queue. There is currently a successful run in that directory when I set finidat to ' '. I have another test on cori I could point you at but it's down for maintenance...

bishtgautam commented 4 years ago

@jonbob On Compy, I had to set PIO_TYPENAME_<ATM/CPL/LND>=netcdf along with the code mod in 50001af3bcfccd5b77ea1e864ecf61893f6a83b3 to get the test to run successfully.