matthewhoffman commented 5 months ago

This merge adds improvements to the hydro_radial test cases to make them more useful

adjust decomposition_test to exactly perform the one-month drift test that steady_state_drift_test runs (results of this are in the MALI GMD paper and the Bueler and van Pelt paper)
adjust restart_test to perform the drift test for two months instead of initializing to no water and running for two years
set Visualize step to always be run for all four tests
Let spinup, drift test and restart test use up to 128 procs (but keep decomp at 1 and 3)

Checklist

[x] User's Guide has been updated
[x] Developer's Guide has been updated
[x] Documentation has been built locally and changes look as expected
[x] Document (in a comment titled Testing in this PR) any testing that was used to verify the changes

matthewhoffman commented 5 months ago

Testing

I set up the four tests using: compass setup -n 85 86 88 87 -w $scratch5/compass-hydro-radial-tests

Results:

landice/hydro_radial/decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: visualize_1proc_run
  * step: 3proc_run
  * step: visualize_3proc_run
  test execution:      SUCCESS
  test validation:     PASS
  test runtime:        00:11
landice/hydro_radial/restart_test
  * step: setup_mesh
  * step: full_run
  * step: visualize_full_run
  * step: restart_run
  * step: visualize_restart_run
  test execution:      SUCCESS
  test validation:     PASS
  test runtime:        00:54
landice/hydro_radial/steady_state_drift_test
  * step: setup_mesh
  * step: run_model
  * step: visualize
  test execution:      SUCCESS
  test runtime:        00:10
landice/hydro_radial/spinup_test
  * step: setup_mesh
  * step: run_model
  * step: visualize
  test execution:      SUCCESS
  test runtime:        09:18
Test Runtimes:
00:11 PASS landice_hydro_radial_decomposition_test
00:54 PASS landice_hydro_radial_restart_test
00:10 PASS landice_hydro_radial_steady_state_drift_test
09:18 PASS landice_hydro_radial_spinup_test
Total runtime 10:35
PASS: All passed successfully!

All looks good, but I'm surprised landice/hydro_radial/restart_test takes nearly a minute. Might be a fluke so I will check what it is when I run the full_integration suite.

matthewhoffman commented 5 months ago

Testing, Part 2

Ran full_integration suite on Chicoma against compass/main (daff021c3). Both use E3SM repo hash 7067f38968. Results:

Test Runtimes:
00:12 PASS landice_dome_2000m_sia_restart_test
00:04 PASS landice_dome_2000m_sia_decomposition_test
00:04 PASS landice_dome_variable_resolution_sia_restart_test
00:02 PASS landice_dome_variable_resolution_sia_decomposition_test
00:21 PASS landice_enthalpy_benchmark_A
00:24 PASS landice_eismint2_decomposition_test
00:23 PASS landice_eismint2_enthalpy_decomposition_test
00:24 PASS landice_eismint2_restart_test
00:24 PASS landice_eismint2_enthalpy_restart_test
00:12 PASS landice_greenland_sia_restart_test
00:11 PASS landice_greenland_sia_decomposition_test
00:19 FAIL landice_hydro_radial_restart_test
00:11 FAIL landice_hydro_radial_decomposition_test
00:34 PASS landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
00:36 PASS landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:19 PASS landice_dome_2000m_fo_decomposition_test
00:17 PASS landice_dome_2000m_fo_restart_test
00:10 PASS landice_dome_variable_resolution_fo_decomposition_test
00:10 PASS landice_dome_variable_resolution_fo_restart_test
00:15 PASS landice_circular_shelf_decomposition_test
00:53 PASS landice_greenland_fo_decomposition_test
00:44 PASS landice_greenland_fo_restart_test
00:25 PASS landice_thwaites_fo_decomposition_test
00:32 PASS landice_thwaites_fo_restart_test
00:13 PASS landice_thwaites_fo-depthInt_decomposition_test
00:19 PASS landice_thwaites_fo-depthInt_restart_test
00:37 PASS landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
00:20 PASS landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 09:38
FAIL: 2 tests failed, see above.

The two hydro_radial tests pass validation but fail baseline comparison as expected and can be blessed:

landice/hydro_radial/restart_test
  * step: setup_mesh
  * step: full_run
  * step: visualize_full_run
  * step: restart_run
  * step: visualize_restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_hydro_radial_restart_test.log
  test runtime:        00:19
landice/hydro_radial/decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: visualize_1proc_run
  * step: 3proc_run
  * step: visualize_3proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_hydro_radial_decomposition_test.log
  test runtime:        00:11

Note the landice/hydro_radial/restart_test run time is 19 seconds here, so the 58 seconds in the previous test was probably a fluke and can be ignored.

Note that the hydro_radial runtimes are a bit longer than the baseline. I ran the integration suite a second time and got essentially the same times for these two tests. Presumably the additional runtime comes mostly from calling the visualization step. I think the extra cost is acceptable, but we could disable the viz step again (but one could run it manually after the integration suite was run if needed.) @trhille , if you prefer that, let me know.

Baseline run times:

00:07 PASS landice_hydro_radial_restart_test
00:05 PASS landice_hydro_radial_decomposition_test

alexolinhager commented 4 months ago

@matthewhoffman, this PR looks good to me, just found one minor typo. No changes to the script itself. I ran the full integration on Perlmutter and got the same results as you, apart from slightly longer run times for the test case. That seems like it was most likely a fluke because the baseline was no different.

Screenshot 2024-04-24 at 5 10 09 PM Screenshot 2024-04-24 at 5 09 57 PM Baseline:

matthewhoffman commented 4 months ago

@trhille , can you approve this PR? For some reason the automated checks are not working correctly - it says "Merging is blocked. Merging can be performed automatically with 1 approving review." but Alex has approved it. I checked the repo rules and only 1 review approval is required and there are no restrictions on who that is from. But that doesn't seem to be working so maybe another review approval will unstick it.

trhille commented 4 months ago

I'm guess @alexolinhager doesn't have write access?

alexolinhager commented 4 months ago

Good question, how would I check that? I'm assuming that's the case because it works now that you approved changes

matthewhoffman commented 4 months ago

I didn't think write access is a requirement based on the Github docs. I'll try to dig deeper. In any case, thanks, @trhille

MPAS-Dev / compass

Improve hydro_radial test cases #811

Testing

Testing, Part 2