ESCOMP / CMEPS

NUOPC Community Mediator for Earth Prediction Systems
https://escomp.github.io/CMEPS/
24 stars 79 forks source link

Can't run with HIST_OPTION=nstep? #311

Closed mnlevy1981 closed 2 years ago

mnlevy1981 commented 2 years ago

I am trying to run for a couple of days with coupler history file every time step to see what fields coming into MOM6 look like. I ran

./xmlchange HIST_N=1,HIST_OPTION=nstep

and now I'm PET####.ESMF_LogFile output with messages like

$ cat PET0000.ESMF_LogFile
20221003 101948.374 ERROR            PET0000 (med_time_alarmInit): unknown option nstep
20221003 101948.374 ERROR            PET0000 ESMF_Alarm.F90:1445 ESMF_AlarmSet() Object being used before creation  - Bad Object
20221003 101948.374 ERROR            PET0000 med_phases_history_mod.F90:224 Object being used before creation  - Passing error in return code
20221003 101948.374 ERROR            PET0000 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:2215 Object being used before creation  - Passing error in return code
20221003 101948.374 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1512 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [MED] med_phases_history_write Expected exit from: MED:(med_phases_history_write)
20221003 101948.374 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1470 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMF_Comp.F90:1255 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMF_GridComp.F90:1905 ESMF_GridCompRun Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:3338 Wrong argument specified  - Failed calling phase med_phases_history_write Run for modelComp 1
20221003 101948.374 ERROR            PET0000 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:3609 Wrong argument specified  - Passing error in return code
20221003 101948.374 ERROR            PET0000 ESM0001:src/addon/NUOPC/src/NUOPC_Driver.F90:3258 Wrong argument specified  - Passing error in return code
20221003 101948.374 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1512 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESM0001] RunPhase1 Expected exit from: MED:(med_phases_history_write)
20221003 101948.374 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1470 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMF_Comp.F90:1255 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ESMF_GridComp.F90:1905 ESMF_GridCompRun Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.374 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:3338 Wrong argument specified  - Failed calling phase RunPhase1 Run for modelComp 1
20221003 101948.374 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:3609 Wrong argument specified  - Passing error in return code
20221003 101948.374 ERROR            PET0000 ensemble:src/addon/NUOPC/src/NUOPC_Driver.F90:3258 Wrong argument specified  - Passing error in return code
20221003 101948.374 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1512 ESMCI:TraceEventPhaseExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ensemble] RunPhase1 Expected exit from: MED:(med_phases_history_write)
20221003 101948.376 ERROR            PET0000 /glade/p/cesmdata/cseg/PROGS/build/28560/esmf-8.2.0b23/src/Infrastructure/Trace/src/ESMCI_Trace.C:1470 ESMCI::TraceEventCompPhaseExit() Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.376 ERROR            PET0000 ESMCI_FTable.C:832 ESMCI_FTableCallEntryPointVMHop Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.376 ERROR            PET0000 ESMCI_FTable.C:1100 c_esmc_compwait Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.376 ERROR            PET0000 ESMF_Comp.F90:1255 ESMF_CompExecute Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.376 ERROR            PET0000 ESMF_GridComp.F90:1905 ESMF_GridCompRun Wrong argument specified  - Internal subroutine call returned Error
20221003 101948.398 INFO             PET0000 ESMF_GridCompDestroy called
20221003 101948.398 INFO             PET0000 ESMF_GridCompDestroy finished
20221003 101948.398 INFO             PET0000 esmApp FINISHED
20221003 101948.398 INFO             PET0000 Finalizing ESMF

The job appeared to hang, so I killed it after ~10 minutes. Currently trying with nsteps instead (nstep is listed as a valid option in env_run.xml but maybe CMEPS only likes the plural option?). If this fails, I'll try nhours.

I'm running a modified cesm2_3_beta09 sandbox, with the cmeps0.13.68 tag (and cime6.0.46, if that matters)

mnlevy1981 commented 2 years ago

Still waiting on my nsteps job to start, but judging from the case options in https://github.com/ESCOMP/CMEPS/blob/ef360eabd92e5dac3e3bae6e553c13fdea87d252/mediator/med_time_mod.F90#L150-L241 I think the culprit is a missing case (optNStep); I don't see any singular optN____ cases at all, so presumably nsecond, nminute, nhour, nday, nmonth, and nyear will all trigger this error

mvertens commented 2 years ago

@mnlevy1981 - I've used every time step output many times with no problem. Can you please let me know how to reproduce your case?

mvertens commented 2 years ago

@mnlevy1981 - according to @jedwards4b you need nsteps - but we should be updating CMEPS to handle either one.

mnlevy1981 commented 2 years ago

I'm okay with using nsteps instead of nstep (and that did work for me), but the valid_values listed in the XML file make it seem like either should work:

    <entry id="HIST_OPTION" value="nsteps">
      <type>char</type>
      <valid_values>none,never,nsteps,nstep,nseconds,nsecond,nminutes,nminute,nhours,nhour,ndays,nday,nmonths,nmonth,nyears,nyear,date,ifdays0,end</valid_values>
      <desc>Sets driver snapshot history file frequency (like REST_OPTION)</desc>
    </entry>

It's great if the plan is to update CMEPS to handle either; if that ends up getting put on the back burner, could we update the valid_values so that nstep is no longer accepted? Otherwise I'm happy to leave everything as-is and close this issue when the singular value is acceptable again

mvertens commented 2 years ago

@mlevy - thanks for catching this. I will update CMEPS to handle either.