NOAA-EMC / MOM6

Modular Ocean Model
Other
0 stars 15 forks source link

Add end of run restart functionality to MOM6 #133

Closed dpsarmie closed 4 months ago

dpsarmie commented 5 months ago

This PR allows the user to create restart files at the end of a run in MOM using the write_restart_at_endofrun configuration option in CMEPS. This configuration option will control end of run restarts for MOM6, CICE, and CMEPS. This PR closes NOAA-EMC/CMEPS#118 and closes ufs-community/ufs-weather-model#2236. A similar PR was made in the CICE repository (CICE PR#77) that will completely resolve these issues.

The code was tested on Hera using the regression test datm_cdeps_control_gefs. This was tested using different combinations of restart setting to try and ensure expected functionality. If the setting is not set or set to False, a restart file will not be created at the end of the run. Setting the option to true will create the file. The end of file restarts for CMEPS, CICE, and MOM will all be controlled with this single configuration option.

jiandewang commented 5 months ago

@DeniseWorthen can you take a preliminary code review before I reach out NCAR side ?

jiandewang commented 5 months ago

@dpsarmie I made a try with latest UWM but replaed with your MOM6 branch. Somehow I am not getting what I expected. I used cpld_control_c48_intel as a template and changed restart_n from 12 to 18. The run length is 24hr and the IC is 20210306. With write_restart_at_endofrun=T it shall give us 20210400 and 20240406 ocean restart files. But I only see the frist one being written out. My run dir is on HERA: /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-T

can you take a look ? is there anything I missed or is there a specific CMEPS branch that I shall use ? my UWM: /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/MOM6-eor/ufs-weather-model, note here I replaced MOM6 with your branch

DeniseWorthen commented 5 months ago

@jiandewang I believe the test you want to do is to leave restart_n=12 but make the run length 27. You should get restarts at hour 12,24 and 27. Without this 'end-of-run' setting, you would only get restarts at 12,24, even though you run all the way to 27.

jiandewang commented 5 months ago

@DeniseWorthen in order to to extend to 27hr run, do I need to change stop_n to 27 (alone with model_configure nhours_fcst=27) ?

DeniseWorthen commented 5 months ago

@jiandewang stop_n should be set == to fhmax

jiandewang commented 5 months ago

@just made a quick test, see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW restart_n = 12 stop_n = 30 nhours_fcst: 30

still don't see final restart file /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW[215]ls -l RESTART/MOM -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 8 13:52 RESTART/20210322.180000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 8 13:56 RESTART/20210323.060000.MOM.res.nc

DeniseWorthen commented 5 months ago

@dpsarmie Could you please check Jiande's run and see if you can spot the issue?

dpsarmie commented 5 months ago

@jiandewang _cpld_controlc48 uses parm/ufs.configure.s2s_esmf.IM but that file does not have the configuration option active.

Go ahead and add write_restart_at_endofrun = .true. and rerun the test. That should (hopefully) solve the problem.

dpsarmie commented 5 months ago

@DeniseWorthen We can talk about whether or not we should add that option to the other ufs.configure files. Currently, only the HAFS and DATM ufs.configure files have the option in the configuration files.

jiandewang commented 5 months ago

@dpsarmie but if you see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW/ufs.configure, I have that in line 124

dpsarmie commented 5 months ago

@dpsarmie but if you see /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW/ufs.configure, I have that in line 124

Ok I see that. I'll try to run the c48 case and see if I can replicate the issue.

DeniseWorthen commented 5 months ago

@dpsarmie When we add this to UWM, we will want to add a configure variable to all the ufs.config files, but in the RT system, this could be false by default. The G-W will then be able to set it true if they need to.

dpsarmie commented 5 months ago

@dpsarmie When we add this to UWM, we will want to add a configure variable to all the ufs.config files, but in the RT system, this could be false by default. The G-W will then be able to set it true if they need to.

Sounds good.


@jiandewang , I'm seeing that this is False in your mediator.log: (med_phases_restart_alarm_init) write_restart_at_endofrun : F (Line 537)

Haven't found what could be causing the issue causing the mediator flag to be incorrectly set but I'll keep looking tomorrow.

DeniseWorthen commented 4 months ago

I think the issue might be that Jiande used "true" vs ".true."

dpsarmie commented 4 months ago

I have a "true" case queued up on Hera right now. I figured that the parser would handle either case correctly but I'll wait an see what this test shows.

DeniseWorthen commented 4 months ago

In CMEPS, it expects ".true."

https://github.com/NOAA-EMC/CMEPS/blob/4e19850cb083bc474b7cde5dc2f8506ec74cc442/mediator/med_phases_restart_mod.F90#L101-L106

dpsarmie commented 4 months ago

actually I tried .true. without success

Ok. Denise is right though, I did a test run with "true" and it wasn't parsed correctly. I'll keep looking through your logs and see if there's any other issues.

DeniseWorthen commented 4 months ago

yes, just looking through some of the other logicals and it doesn't seem to matter. weird.

jiandewang commented 4 months ago

just re-submitted my test case with .true. this time mediator.log shows (med_phases_restart_alarm_init) write_restart_at_endofrun : T

let's wait for couple of minutes to see what's going on

jiandewang commented 4 months ago

using .true. give me what I am expecting. now let me try run length=24, restart_n=18 to see what will happen

DeniseWorthen commented 4 months ago

@jiandewang Please check that you also get mediator cpl.r files at the same times that you get MOM6 restarts. In the end, when also including the CICE changes, we need all three components to have the capability to write at restart_n and at the end.

jiandewang commented 4 months ago

@DeniseWorthen yes we got that file /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW1/RESTART[119]ls -l MOM -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:03 20210322.180000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:07 20210323.060000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 10:10 20210323.120000.MOM.res.nc /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW1/RESTART[120]ls -l ufs.cpld.cpl* -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:03 ufs.cpld.cpl.r.2021-03-22-64800.nc -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:08 ufs.cpld.cpl.r.2021-03-23-21600.nc -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 10:10 ufs.cpld.cpl.r.2021-03-23-43200.nc

for my test case, run length=30, restart_n=12

jiandewang commented 4 months ago

my second try: run length=60, restart_n=18, works as expected: /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW3[185]ll RESTART/MOM -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 13:58 RESTART/20210323.000000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:05 RESTART/20210323.180000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:12 RESTART/20210324.120000.MOM.res.nc -rw-r--r-- 1 Jiande.Wang stmp 6926138 May 9 14:14 RESTART/20210324.180000.MOM.res.nc

/scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_272814/cpld_control_c48_inte-TDW3[186]ll RESTART/ufs* -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 13:59 RESTART/ufs.cpld.cpl.r.2021-03-23-00000.nc -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:05 RESTART/ufs.cpld.cpl.r.2021-03-23-64800.nc -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:12 RESTART/ufs.cpld.cpl.r.2021-03-24-43200.nc -rw-r--r-- 1 Jiande.Wang stmp 8086484 May 9 14:14 RESTART/ufs.cpld.cpl.r.2021-03-24-64800.nc

I am going to ask NACR to test on their side to make sure it won't break their system

jiandewang commented 4 months ago

@dpsarmie MOM6 dev/emc just had one updating. Can you sync your branch ? I will reach out NCAR after you sync your branch. Thanks

dpsarmie commented 4 months ago

@dpsarmie MOM6 dev/emc just had one updating. Can you sync your branch ? I will reach out NCAR after you sync your branch. Thanks

@jiandewang , I've updated the branch. Thanks again for testing and the help.

jiandewang commented 4 months ago

now we got greenlight from NCAR. I will prepare a UWM PR for it to get merged to dev/emc

jiandewang commented 4 months ago

combined with UWM (https://github.com/ufs-community/ufs-weather-model/pull/2205)

FernandoAndrade-NOAA commented 4 months ago

Testing for #2205 has completed successfully, please continue with merging this PR.