E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
348 stars 356 forks source link

Compset FC5AV1C-H01B for 1/4 degree crashes in land init #2637

Closed oksanaguba closed 1 year ago

oksanaguba commented 5 years ago

Conf. line for the issue is: ${e3sm}/cime/scripts/create_newcase -case $fol -res ne120_ne120 -mach anvil -compiler intel -compset FC5AV1C-H01B Tested with master commit fc738180c8840 e3sm log:

[2664]  NetCDF: Invalid dimension ID or name
[0]  NetCDF: Invalid dimension ID or name
[1152] Image              PC                Routine            Line        Source
[1152] e3sm.exe           000000000261686B  Unknown               Unknown  Unknown
[1152] e3sm.exe           000000000235E131  pio_support_mp_pi         118  pio_support.F90
[1152] e3sm.exe           000000000235C215  pio_utils_mp_chec          59  pio_utils.F90
[1152] e3sm.exe           0000000002345432  nf_mod_mp_pio_inq        1288  nf_mod.F90
[1152] e3sm.exe           00000000018A7534  ncdio_pio_mp_chec         354  ncdio_pio.F90.in
[1152] e3sm.exe           00000000018DB2DA  restfilemod_mp_re        1072  restFileMod.F90
[1152] e3sm.exe           0000000001796379  clm_initializemod         722  clm_initializeMod.F90
[1152] e3sm.exe           000000000176E3B7  lnd_comp_mct_mp_l         236  lnd_comp_mct.F90
[1152] e3sm.exe           0000000000436578  component_mod_mp_         246  component_mod.F90
[1152] e3sm.exe           0000000000426994  cime_comp_mod_mp_        1243  cime_comp_mod.F90
[1152] e3sm.exe           0000000000433679  MAIN__                    122  cime_driver.F90
[1152] e3sm.exe           000000000041651E  Unknown               Unknown  Unknown
[1152] libc-2.12.so       00002ABE9A522D1D  __libc_start_main     Unknown  Unknown
[1152] e3sm.exe           00000000004163A9  Unknown               Unknown  Unknown

File in question: finidat = '/home/ccsm-data/inputdata/lnd/clm2/initdata_map/clmi.ICRUCLM45.ne120_g16.1155ba0.clm2.r.nc'

rljacob commented 5 years ago

That's probably another file that needs to be modified because of https://github.com/E3SM-Project/E3SM/pull/2564. @oksanaguba just set finidat="" in user_nl_clm to get running.

oksanaguba commented 5 years ago

thanks. I need climatologies, so, I was told finidat='' won't solve the problem. we have an older fork and I am running it instead of master now.

PeterCaldwell commented 5 years ago

@rljacob and others - sorry for being out of the loop, but what are the plans for getting new finidat files for the compsets we use? Do I need to make a JIRA task and push on this for it to happen, or is it happening already without my intervention?

rljacob commented 5 years ago

If you can make a list of finidat files from the compsets you use, assign a task to @thorntonpe to convert.

whannah1 commented 5 years ago

Why doesn't the model run without specifying the finidat file?

This is a long-standing issue and omitting the file works for other resolution, so I would assume it would work for older branches as well. My plan for SP-E3SM was to spin-up the land model without specifying this file and then use the resulting land restart file for finidat going forward.

PeterCaldwell commented 5 years ago

@rljacob - thanks. The files I need are: in 1950_CMIP6HR_control.xml: lnd/clm2/initdata_map/clmi.A_WCYCL1950.ne120np4_oRRS18to6v3_ICG_simyr1950_c20180124.nc

in 1950_CMIP6LR_control.xml: lnd/clm2/initdata/clmi.A_WCYCL1950S_CMIP6_LRtunedHR.ne30_oECv3_ICG_simyr1950_c20180316.nc

It may take me a while to make the JIRA tasks since I'm on vacation this week. In compiling my list, I noticed that only 1850_CMIP6_control.xml was updated for this change. @golaz - you probably want to ask for the CMIP6 transient to get updated. @susburrows - it looks like your BGC compsets also need updating. @stephenprice - I see some cryo compsets that may also need to be updated to work with this change to the info needed from land init files...

susburrows commented 5 years ago

@PeterCaldwell thanks for looping me in.

We don't have #2564 on the v1.1 branch, so this won't affect our current simulation campaign. When we finalize the initial conditions for the BGC campaign, we will update the appropriate compsets on both master and maint-1.1, which I think will resolve this issue for the BGC compsets. @thorntonpe will be able to confirm.

susburrows commented 5 years ago

Oh wait, I take it back, I guess those files will need to be converted to work with the new sub-grid topographic levels? In any case I don't think we will want to worry about this until we have the new initial condition files finalized.

thorntonpe commented 5 years ago

Why doesn't the model run without specifying the finidat file?

This is a long-standing issue and omitting the file works for other resolution, so I would assume it would work for older branches as well. My plan for SP-E3SM was to spin-up the land model without specifying this file and then use the resulting land restart file for finidat going forward.

@whannah1 , the approach you suggest is the one that I think should be used in general by all developers. There are a few specific cases where spun-up files are required, and are either difficult to generate from scratch or require too many years of simulation. In those cases, I am creating new finidat files by adding the necessary variables and new dimension to the existing file. There are still disequilibrium conditions generated by my approach, so I encourage everyone to do some additional spinup of the model even with the new files.

If you really want/need an equilibrium state for a publication-quality simulation, then you should do a spin-up from a cold start (finidat = " "), in my opinion.

thorntonpe commented 5 years ago

@rljacob - thanks. The files I need are: in 1950_CMIP6HR_control.xml: lnd/clm2/initdata_map/clmi.A_WCYCL1950.ne120np4_oRRS18to6v3_ICG_simyr1950_c20180124.nc

in 1950_CMIP6LR_control.xml: lnd/clm2/initdata/clmi.A_WCYCL1950S_CMIP6_LRtunedHR.ne30_oECv3_ICG_simyr1950_c20180316.nc

It may take me a while to make the JIRA tasks since I'm on vacation this week. In compiling my list, I noticed that only 1850_CMIP6_control.xml was updated for this change. @golaz - you probably want to ask for the CMIP6 transient to get updated. @susburrows - it looks like your BGC compsets also need updating. @stephenprice - I see some cryo compsets that may also need to be updated to work with this change to the info needed from land init files...

@PeterCaldwell , can you give me a little background on how you are using these compsets? It may be better in the long term for you to generate your own finidat file from a cold-start. What are your requirements for equilibrium state for the land model? For the BGC experiments, we have very exacting requirements, and we know how to generate the finidat files that are necessary to meet them. That usually involves long spin-up simulations. Without the BGC codes turned on, the spin-up is usually much less demanding, and depending on your requirements it may be very short.

thorntonpe commented 5 years ago

Oh wait, I take it back, I guess those files will need to be converted to work with the new sub-grid topographic levels? In any case I don't think we will want to worry about this until we have the new initial condition files finalized.

@susburrows Any science experiments that we branch from the existing runs will need to use the current v1.1 code. If we change the code, we need to re-do the spinups to have valid experiments. I'm assuming that when we move to v2 for new BGC experiments, we will have new science code, and will need new spinups. At that point we would do a cold-start with the new code, which generates the correct files for initialization of further spin-up run segments.

susburrows commented 5 years ago

@thorntonpe got it. So are you thinking that for the purposes of regression testing, until new inital conditions are generated, the BGC compsets will just use a cold start?

PeterCaldwell commented 5 years ago

@thorntonpe - thanks for the reply, which brings up some good points. the ne120 init file I need definitely falls within the "difficult to generate from scratch/requires too many years of simulation" since 1 yr of coupled high-res simulation costs 2M core hrs and takes >1 day to complete. When we talked with @bishtgautam to generate the previous IC we were using, we decided to interpolate from the CMIP6 DECK 1950 run to create finidat. I'm open to other options. The ne30 init file could be run to equilibrium, but its whole purpose is to be a low-res companion to the ne120 run, so doing the same thing in both cases would be beneficial.

thorntonpe commented 5 years ago

@thorntonpe got it. So are you thinking that for the purposes of regression testing, until new inital conditions are generated, the BGC compsets will just use a cold start?

I have already addressed all the existing tests in the land developer and integrator's test suite by adding new finidat files where necessary. If we are thinking of new regression tests that bring in additional coupled compsets, then I would suggest we take the time to do the cold-start approach, and perform at least some preliminary spin-up, then use those finidat files. Having a combination of cold-start and partly spun-up tests is probably a good idea, since we need to "protect" both types of runs with our testing.

thorntonpe commented 5 years ago

@thorntonpe - thanks for the reply, which brings up some good points. the ne120 init file I need definitely falls within the "difficult to generate from scratch/requires too many years of simulation" since 1 yr of coupled high-res simulation costs 2M core hrs and takes >1 day to complete. When we talked with @bishtgautam to generate the previous IC we were using, we decided to interpolate from the CMIP6 DECK 1950 run to create finidat. I'm open to other options. The ne30 init file could be run to equilibrium, but its whole purpose is to be a low-res companion to the ne120 run, so doing the same thing in both cases would be beneficial.

@PeterCaldwell , it's helpful to know how these two resolutions are intended to interact. Do you know what the requirement is for closeness to equilibrium in the land model for your testing? Using satellite phenology, as you are doing in these compsets, you get a first-order approximation of the land system at cold-start. If the disequilibrium of soil hydrology at start-up, and associated drifts in surface fluxes, causes problems for your testing, then I can see the need for a spun-up state, even if it is a very approximate one.

I'm thinking that, if a somewhat equilibrated land state is what you need, then the best thing to do is to run an offline land simulation from cold-start at the required resolution. That is much faster, and even though it would be driven by reanalysis climate and not the coupled model climate, you would still get a quasi-equilibrated state as far as soil hydrology is concerned. doing that, you could generate the finidat files at different resolutions from the same drivers, and your LR and HR runs would have very comparable land initial conditions. What do you think of that option?

darincomeau commented 5 years ago

@thorntonpe, is a new finidat file being generated to for:

/project/projectdirs/acme/inputdata/lnd/clm2/initdata_map/clmi.I1850CLM45.ne30_oECv3wLI.clm2.r.0331-01-01-00000.nc

As discussed by email, the cryosphere simulations need this file to begin our coupled simulations.

thorntonpe commented 5 years ago

@thorntonpe, is a new finidat file being generated to for:

/project/projectdirs/acme/inputdata/lnd/clm2/initdata_map/clmi.I1850CLM45.ne30_oECv3wLI.clm2.r.0331-01-01-00000.nc

As discussed by email, the cryosphere simulations need this file to begin our coupled simulations.

@darincomeau Yes, I'm working on that. Sorry it is taking longer than expected.

darincomeau commented 5 years ago

@thorntonpe No worries - thanks.

susburrows commented 5 years ago

@thorntonpe thanks -- sounds like there is nothing to do right now for v1, but I do wonder if the BGC compsets will need new initial condition files (maybe just converted from the v1 files) that can be used for development and testing purposes in the interim period before the v2 model is finalized and spun up. I am copying @kvcalvin so that she is aware that this is something that will need to be addressed for the v2 model / experiments.

rljacob commented 5 years ago

These spinups for finidat files are done with I-cases right? So they should be cheap and fast even at high resolution.

thorntonpe commented 5 years ago

These spinups for finidat files are done with I-cases right? So they should be cheap and fast even at high resolution.

I'm not positive that it has been done that way consistently in the past, but that is what I would suggest for cases that need spun-up initial conditions for testing purposes, as opposed to very refined sp[un-up states for science experiments. See my most recent reply above to Peter Caldwell.

rljacob commented 5 years ago

@thorntonpe you said "I have already addressed all the existing tests in the land developer and integrator's test suite by adding new finidat files where necessary". We also need new files for the e3sm_high_res test suite.

thorntonpe commented 5 years ago

Sorry - is this still an issue? I thought this was solved with the new finidat file created by @bishtgautam. If there are additional tests that are failing, can someone provide a list?

rljacob commented 5 years ago

SMS.ne120_ne120.FC5AV1C-H01A SMS.ne120_oRRS18v3_ICG.A_WCYCL2000_H01AS

thorntonpe commented 5 years ago

OK. Are we sure that these tests really need to use land finidat files, and cannot be served just as well by a land cold-start condition? We've seen cases that go both ways as we've addressed other test failures. For the cases that don't really need a spun-up land condition, it is better for long-term maintenance to change the test to use a cold start.

rljacob commented 5 years ago

2 more: enax4v1_enax4v1.FC5AV1C-L twpx4v1_twpx4v1.FC5AV1C-L

I'm not sure how to decide what can be done with a cold-start or not. Is there a difference in what science code gets executed for spun-up vs. cold start?

PeterCaldwell commented 5 years ago

Sorry for dropping off this thread before we reached resolution. I still believe that any compset used for science needs to have a finidat unless it's explicitly clear to users that they will need to spin the land up from scratch... Otherwise the resulting simulation is not scientifically credible. For high-res compsets at least, long spin-ups are not a reasonable proposition. For that reason, we should have at least one test for each science compset which has specified finidat: if compsets break because their IC data quits working, we should know about it. That's why the logic "tests aren't used for science, so we don't need finidat" doesn't work for me.

That said, I feel like the ball's in my court for deciding how high-res finidat files should be created. As Peter T said a while ago, spinning up from a cold start is the best way to go... Unfortunately, that is too expensive for a model that takes 100M core hrs to run for 50 yrs. So far on the high-res project, we've interpolated the land IC from the low-res DECK runs. I suspect that will be the path forward in the future. I haven't gotten back to you about creating new finidat files, Peter T, because I've been super busy with visitors and getting ready for AGU. I'm hoping to prioritize this the week after AGU.

thorntonpe commented 5 years ago

I'm happy to help. Remember that the land can be spun up from cold start offline, which is much cheaper and faster than a coupled run. Having a segment of coupled climate output at high frequency is good, but not essential.