ESCOMP / SimpleLand

Simple Land Model for CESM --- *** IN DEVELOPMENT *** --- please contact for more info. See supplemental information of https://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-18-0812.1 for a description of SLIM physics. Implementation of SLIM into the main CESM trunk is ongoing. SLIM currently works with the CESM2.1 release, but must be downloaded from this repository until we finish implementing it properly into the main CESM code.
Other
12 stars 7 forks source link

mml_main cleanup to get past testing errors #67

Closed slevis-lmwg closed 1 year ago

slevis-lmwg commented 1 year ago

Relates to issue #42 which identifies 2 cheyenne tests and their 2 izumi equivalents as failing. The issue discusses:

I will run full test-suites to confirm that answers remain unchanged for all the tests. Then this PR will be ready for merging.

slevis-lmwg commented 1 year ago

izumi test-suite PASS except the pgi test doesn't build. I heard that pgi would be removed soon and maybe that has now happened?

cheyenne test-suite PASS

slevis-lmwg commented 1 year ago

As expected, the two izumi tests from #42 continue to fail, but they point to an error in the first timestep here: cesm.exe 0000000000A0E82D mml_mainmod_mp_ph 2519 mml_main.F90 So I will pursue this further on izumi (rather than on cheyenne where the new error shows up as an mpt error in timestep 1490 of case2).

slevis-lmwg commented 1 year ago

The two izumi tests from #42 continue to fail, but latest error says:

[cli_54]: aborting job:
Fatal error in MPI_Irecv: Invalid datatype, error stack:
MPI_Irecv(153): MPI_Irecv(buf=0x7fe305fc7e20, count=1, INVALID DATATYPE, src=49, tag=241, comm=0xc4000012, request=0x7ffe6b153c40) failed
MPI_Irecv(103): Invalid datatype
[mpiexec@i042.cgd.ucar.edu] HYDT_bscd_pbs_wait_for_completion (tools/bootstrap/external/pbs_wait.c:67): tm_poll(obit_event) failed with TM error 17002
[mpiexec@i042.cgd.ucar.edu] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion

I tried one of the two as a nag instead of intel test and it passed!

I will probaby stop pursuing this error any further once I confirm that my mods haven't broken anything else.

slevis-lmwg commented 1 year ago

@ekluzek I have confirmed on izumi that my mods haven't changed answers. I'm running the cheyenne test-suite right now.

Meanwhile the tests that were failing now stop at one of the new assert statements. I would like to discuss with you how to pursue this further or whether to table it for now. May I send and invite for a quick chat?

slevis-lmwg commented 1 year ago

@ekluzek recommended a code change that got one of the failing tests to PASS on cheyenne.

@slevisconsulting will rerun test-suites. @ekluzek is concerned that this will change answers.

slevis-lmwg commented 1 year ago

Good news: Cheyenne test-suite: PASS (no diffs from baseline). There's only one failing test left on cheyenne, and it's listed in the expected failures and in #17.

slevis-lmwg commented 1 year ago

Izumi test-suite PASS, except pgi has stopped working in the last couple of weeks; I assume this means that the pgi compiler has been removed.

@ekluzek if you are also comfortable with this, I can go ahead and merge/tag.

ekluzek commented 1 year ago

@slevisconsulting yes PGI has been removed. So go ahead and merge this. You might as well remove the PGI test that no longer works. PGI is gone at this point. Going forward we should use the nvhpc compiler which is the successor to PGI (PGI was bought out by NVIDIA).