ESCOMP / CDEPS

Community Data Models for Earth Prediction Systems
https://escomp.github.io/CDEPS/versions/master/html/index.html
18 stars 39 forks source link

[WIP] Plumber updates #262

Open TeaganKing opened 4 months ago

TeaganKing commented 4 months ago

This will address #248 In order to implement PLUMBER capabilities

slevis-lmwg commented 2 months ago

From meeting with @ekluzek

ekluzek commented 2 months ago

A note on somethings we need to do before this is asked for merge:

slevis-lmwg commented 2 months ago

Agreed in today's ctsm software meeting: @TeaganKing will notify @slevis-lmwg when he should run the aux_cdeps test-suite.

TeaganKing commented 2 months ago

Per conversation with Erik, we can remove the files listed in PLUMBER2 user mod directories because these will be implemented in another PR (#277 ); those do not need to be moved to CDEPS.

However, we do need to implement the dtlimit used for these various streams specifically for PLUMBER-- hence the placeholder values that I still need to ensure work properly for changing dtlimit when CLM_USRDAT_NAME is set to PLUMBER.

Variables in those user mod directories that are duplicated in the CDEPS stream can be removed once this PR is merged in.

TeaganKing commented 3 weeks ago

The dtlimit is updated as expected when running CTSM when I do an xmlchange to set CLM_USRDAT_NAME to PLUMBER. So, @slevis-lmwg , I think we can run the aux_cdeps test-suite. Note that the CTSM changes (https://github.com/ESCOMP/CTSM/pull/2485 and https://github.com/ESCOMP/CTSM/pull/2406) are not yet available (since they're dependent on this CDEPS PR).

TeaganKing commented 3 weeks ago

This PR introduced CLM_USRDAT_NAME as PLUMBER2 instead of PLUMBER, so I will update that now.

slevis-lmwg commented 3 weeks ago

@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: https://github.com/ESCOMP/CTSM/pull/2485 https://github.com/ESCOMP/CTSM/pull/2406

262

before I start the aux_cdeps test-suite, right?

Also, a note to myself: The checklist points out that I need to generate a baseline.

TeaganKing commented 3 weeks ago

@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: ESCOMP/CTSM#2485 ESCOMP/CTSM#2406 #262 before I start the aux_cdeps test-suite, right?

Also, a note to myself: The checklist points out that I need to generate a baseline.

ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.

slevis-lmwg commented 3 weeks ago

ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.

Ok, based on this information, I think I could go ahead and submit aux_cdeps with #262 with ctsm from master (I will try ctsm5.2.007 which is the current latest).

slevis-lmwg commented 3 weeks ago

I tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.38, i.e. the same cdeps that I see in @TeaganKing's branch: ./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.38_ctsm5.2.008

I also tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.34, i.e. the default cdeps for ctsm5.2.008: ./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.34_ctsm5.2.008

The former seems less surprising, if e.g. there are incompatibilities between ctsm5.2.008 and cdeps1.0.38.

The latter though means that I have a problem with aux_cdeps (environment or other?) or that aux_cdeps has a problem (in which case it should fail for others, as well).

@TeaganKing at this point I will need help from @ekluzek with this. I will raise the issue at Monday's stand-up.

slevis-lmwg commented 2 weeks ago

I encountered the same problem this morning even with aux_clm and ctsm_sci. This helped me realize that the problem may be as simple as setting an account number that hasn't expired. I will try this again today or tomorrow.

UPDATE 1: I submitted the same two tests. I expect that at least the cdeps1.0.34 should work and generate a baseline.

UPDATE 2: Worked out the opposite from what I expected:

UPDATE 3: Submitted aux_cdeps comparing this branch to the baseline (tests_0703-140019de). ./run_sys_tests -s aux_cdeps --skip-generate -c cdeps1.0.38_ctsm5.2.008

slevis-lmwg commented 2 weeks ago

@TeaganKing two updates: 1) Erik clarified that the second checkbox (currently unchecked) is asking you to run one or more plumber cases to confirm that they work. 2) aux_cdeps fails for several tests with this error during the build phase:

2024-07-03 14:01:30: Test 'SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel' failed in phase 'SETUP' with exception 'ERROR: Fatal error in case.cmpgen_namelists: 2024-07-03 14:01:29 atm
Create namelist for component datm
   Calling /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
   Running /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
Traceback (most recent call last):
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 336, in <module>
    _main_func()
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 332, in _main_func
    buildnml(case, caseroot, "datm")
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 311, in buildnml
    _create_namelists(case, confdir, inst_string, namelist_infile, nmlgen, data_list_path)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 211, in _create_namelists
    streams = StreamCDEPS(stream_file, schema_file)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/../../cime_config/stream_cdeps.py", line 65, in __init__
    GenericXML.__init__(self, infile, schema)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 78, in __init__
    self.read(infile, schema)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 129, in read
    self.read_fd(fd)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 159, in read_fd
    self.tree = ET.parse(fd)
  File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
    tree.parse(source, parser)
  File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4083, column 15
ERROR: /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml /glade/derecho/scratch/slevis/tests_0703-140019de/SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.C.0703-140019de_int FAILED, see above'
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1125, in _run_catch_exceptions
    return run(test)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1016, in _setup_phase
    "Fatal error in case.cmpgen_namelists: {}".format(output),
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/utils.py", line 176, in expect
    raise exc_type(msg)
slevis-lmwg commented 2 weeks ago

In case it helps, here's a list of tests that PASS versus FAIL:

    PASS SMS_Ld2.ww3a.2000_SATM_SLND_SICE_SOCN_SROF_SGLC_DWAV%CLIMO.derecho_intel RUN 
    PASS SMS_Ld3.f09_f09_mg17.1850_SATM_DLND%SCPL_SICE_SOCN_SROF_SGLC_SWAV.derecho_intel RUN 
    PASS SMS_Ly3.f10_f10_ais8gris4_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN 
    PASS SMS_Ly3.f10_f10_ais8_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN
    PASS SMS_Ly3.f19_g17_gris4.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN

As far as I can tell, the PEND failures report the same error as the FAIL in this list:

    FAIL SMS_Ld5.f10_f10_mg37.1850_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%CRUv7_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%QIA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2010_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.HIST_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.SSP585_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5_P1.1x1_mexicocityMEX.2000_DATM%1PT_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.datm-1PT SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%IAF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%IAF_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.TL319_t061.2000_DATM%JRA-1p4-2018_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.TL319_t061.2000_DATM%JRA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%SOMAQP_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.HIST_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln9_P1.T42_T42.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel.datm-scam SHAREDLIB_BUILD
slevis-lmwg commented 2 weeks ago

My quick look at the above lists suggests that SATM tests PASS and DATM tests fail.

TeaganKing commented 1 week ago

Thank you for running these tests and clarifying the 2nd checkbox item!

Regarding actually running the PLUMBER case, we don't have run_tower() fully functioning at the moment. I was thinking it may be most helpful to move this in and then finalize run_tower() since it will require these changes?

slevis-lmwg commented 1 week ago

@TeaganKing if your suggestion does not affect whether the aux_cdeps test-suite can be fixed in this PR (which I suspect and hope is true), then I would be fine with that. Still, I would like @ekluzek to also weigh in on your suggestion.