E3SM-Project / polaris

Testing and analysis for OMEGA, MPAS-Ocean, MALI and MPAS-Seaice
BSD 3-Clause "New" or "Revised" License
6 stars 13 forks source link

Add geostrophic convergence test (Williamson Test 2) #120

Closed xylar closed 1 year ago

xylar commented 1 year ago

This merge adds the geostrophic convergence test case from Williamson et al. 1992. The test case starts out in geostrophic balance and ideally remains in steady state.

The test case has similarities to cosine_bell so efforts have been made to share code, primarily the forward step, between the two.

Checklist

xylar commented 1 year ago

This is based off of #119 and will be rebased after that gets merged.

xylar commented 1 year ago

Thanks to @lconlon for starting out this test case with me in the Polaris Hackathon in May!

xylar commented 1 year ago

Testing

I have run all 4 geostrophic test cases on Chrysalis with Intel and OpenMPI. They ran successfully and produced the expected plots.

The order of convergence for water-column thickness for the QU grids is really terrible -- 0.4. So that's discoraging for MPAS-Ocean but not an indication of a problem with the tests. The Icos meshes do much better -- about 1.6.

QU: convergence_h convergence_vel

Icos: convergence_h convergence_vel

xylar commented 1 year ago

Here are some plots of water-column thickness, u, v and normal velocity (for the icos 240km mesh):

final_h final_u final_v final_norm_vel

xylar commented 1 year ago

Here are diff plots of water-column thickness for the QU and Icos meshes with increasing resolution.

QU

diff_h diff_h diff_h diff_h diff_h diff_h diff_h

Icos

diff_h diff_h diff_h diff_h

xylar commented 1 year ago

I need to update this to use #126, so it will likely make sense to wait on reviewing this until after that PR has gone in.

xylar commented 1 year ago

@cbegeman, just an update that this is coming along and now uses the shared analysis, but I need to test it.

cbegeman commented 1 year ago

@xylar Just an FYI that I'll be posting a PR soon with some additional clean-up of the convergence analysis step. We might want to get this reviewed and merged first so that you don't have to do another rebase.

xylar commented 1 year ago

@cbegeman and @sbrus89, this is ready for review (again)!

xylar commented 1 year ago

@cbegeman, thanks for catching those mistakes, which are now fixed. Let me know if there are other changes you would like me to make.

xylar commented 1 year ago

@cbegeman, I'd prefer to wait until @sbrus89 has time to look at this. That may very well mean that I will rebase onto #131 once that's done. That's fine, since either I will have to rebase this or you will have to rebase that and I don't mind the extra work.

sbrus89 commented 1 year ago

@xylar, I will get to this on Monday. Thanks for your patience.

xylar commented 1 year ago

@sbrus89, that would be great!

xylar commented 1 year ago

@sbrus89, thanks very much for your review! I think I've taken care of your suggestions. As I wrote above, I'd prefer to stick with the default resolutions rather than define them in geostrophic.cfg

sbrus89 commented 1 year ago

@xylar - I'm testing a local merge and am seeing this from polaris list:

Traceback (most recent call last):
  File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 33, in <module>
    sys.exit(load_entry_point('polaris', 'console_scripts', 'polaris')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 25, in importlib_load_entry_point
    return next(matches).load()
           ^^^^^^^^^^^^^^^^^^^^
  File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/__main__.py", line 8, in <module>
    from polaris import cache, list, setup, suite
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/list.py", line 6, in <module>
    from polaris.components import get_components
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/components.py", line 5, in <module>
    from polaris.ocean import Ocean
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/__init__.py", line 4, in <module>
    from polaris.ocean.tasks.geostrophic import add_geostrophic_tasks
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/tasks/geostrophic/__init__.py", line 6, in <module>
    from polaris.ocean.tasks.geostrophic.analysis import Analysis
  File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/tasks/geostrophic/analysis.py", line 3, in <module>
    from polaris.ocean.convergence.spherical import SphericalConvergenceAnalysis
ImportError: cannot import name 'SphericalConvergenceAnalysis' from 'polaris.ocean.convergence.spherical' (/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/convergence/spherical/__init__.py)
cbegeman commented 1 year ago

@xylar Maybe it's worth rebasing onto https://github.com/E3SM-Project/polaris/pull/131 in the process of addressing @sbrus89's issue? Sorry for the hassle.

xylar commented 1 year ago

@sbrus89, I'm not able to reproduce that issue locally. Could you try recreating your polaris conda environment to see if that helps?

Update: after rebasing, I see the issue. Now I get it, you did a local test merge with main?

xylar commented 1 year ago

But, yes, I'll rebase tomorrow and you can also try again after that. It might be best to wait.

xylar commented 1 year ago

@sbrus89, sorry for the confusion on my part yesterday. I was too tired I guess and I missed that this was with a test merge. I have rebased and fixed thing up. Retesting now but I think you can give it another try when you have time.

sbrus89 commented 1 year ago

@xylar, no problem. Yes I was testing with a local merge with main. I'll give it another go today. Thanks for rebasing!

xylar commented 1 year ago

I successfully tested the convergence suite with the geostrophic tests added:

Task Runtimes:
0:01:01 PASS ocean/planar/inertial_gravity_wave
0:00:55 FAIL ocean/planar/manufactured_solution
0:05:27 PASS ocean/spherical/icos/cosine_bell
0:09:15 PASS ocean/spherical/qu/cosine_bell
0:01:35 PASS ocean/spherical/icos/geostrophic
0:03:02 PASS ocean/spherical/qu/geostrophic
Total runtime: 0:21:16

I was having some performance problems with the geostrphic/with_viz tests that seem random but also persistent. I don't know if it's a Chrysalis problem, an issue wiht ESMF or something else. But I also hope we can replace this viz soon with something better so I'm not inclined to investigate too much.

sbrus89 commented 1 year ago

@xylar, I'm seeing this error:

ocean/spherical/icos/geostrophic
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 33, in <module>
    sys.exit(load_entry_point('polaris', 'console_scripts', 'polaris')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/__main__.py", line 62, in main
    commands[args.command]()
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 196, in main
    run_tasks(suite_name='task', quiet=args.quiet, is_task=True,
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 103, in run_tasks
    result_str, success, task_time = _log_and_run_task(
                                     ^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 316, in _log_and_run_task
    task.steps_to_run = _update_steps_to_run(
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 222, in _update_steps_to_run
    step_str = config.get(task_name, 'steps_to_run').replace(',', ' ')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/site-packages/mpas_tools/config.py", line 115, in get 
    return self.combined.get(section, option)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/configparser.py", line 797, in get 
    d = self._unify_values(section, vars)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/configparser.py", line 1168, in _unify_values
    raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'icos_geostrophic'

I looked in the geostrophic.cfg file and only see a icos_geostrophic_with_viz section. I have a feeling this is because I set up the cases in this order: ocean/spherical/icos/geostrophic, ocean/spherical/icos/geostrophic/with_viz before running ocean/spherical/icos/geostrophic. Is this expected behavior?

xylar commented 1 year ago

@sbrus89, that does sound like a bug. Can you send me the commands you used to set up the 2 tasks? I want to make sure I can reproduce it. I always have been setting up both tasks together.

sbrus89 commented 1 year ago

@xylar - I did the following:

polaris setup -t ocean/spherical/icos/geostrophic -w $WORKROOT/wiliamson
polaris setup -t ocean/spherical/icos/geostrophic/with_viz -w $WORKROOT/wiliamson 
polaris setup -t ocean/spherical/qu/geostrophic -w $WORKROOT/wiliamson
polaris setup -t ocean/spherical/qu/geostrophic/with_viz -w w $WORKROOT/wiliamson

and then

cd $WORKROOT/wiliamson/ocean/spherical/icos/geostrophic
sbatch job_script.sh
xylar commented 1 year ago

@sbrus89, great, thanks! I'm looking into this now.

Just as a tip to make your workflow more efficient, you may want to do:

polaris list

and then:

polaris setup -n 17 18 19 20 -w $WORKROOT/wiliamson

This will set up a custom suite with all 4 tasks. But it doesn't seem to expose the problem you showed so I'm glad you did things the "more tedious" way.

xylar commented 1 year ago

@sbrus89, I understand now what is going wrong and I don't have a good idea what to do about it yet. It isn't anything to do with this PR, it was caused by #125. When you separately set up 2 tasks that share the same config file, the first task will create a section, option and value with its steps_to_run. Then, the second task will overwrite the config file, including its section, option and value with its steps_to_run. If you were to set them up together, both sections, options and values would be populated as excepted:

[icos_geostrophic]

# A list of steps to include when running the icos_geostrophic task
# source: /gpfs/fs1/home/ac.xylar/e3sm_work/polaris/add-geostrophic-tests/polaris/setup.py
steps_to_run = icos_base_mesh_60km icos_init_60km icos_forward_60km icos_base_mesh_120km icos_init_120km icos_forward_120km icos_base_mesh_240km icos_init_240km icos_forward_240km icos_base_mesh_480km icos_init_480km icos_forward_480km analysis

[icos_geostrophic_with_viz]

# A list of steps to include when running the icos_geostrophic_with_viz task
# source: /gpfs/fs1/home/ac.xylar/e3sm_work/polaris/add-geostrophic-tests/polaris/setup.py
steps_to_run = icos_base_mesh_60km icos_init_60km icos_forward_60km icos_map_cell_60km icos_map_edge_60km icos_viz_60km icos_base_mesh_120km icos_init_120km icos_forward_120km icos_map_cell_120km icos_map_edge_120km icos_viz_120km icos_base_mesh_240km icos_init_240km icos_forward_240km icos_map_cell_240km icos_map_edge_240km icos_viz_240km icos_base_mesh_480km icos_init_480km icos_forward_480km icos_map_cell_480km icos_map_edge_480km icos_viz_480km analysis

Presumably, the solution will be to include steps_to_run from all tasks when writing out a shared config file. Since this is unrelated to geostrophic, I'll address it in a separate PR.

xylar commented 1 year ago

@sbrus89, would you like to review #134 and for me to merge that and rebase? Or can you review this without that fix?

sbrus89 commented 1 year ago

@xylar, I figured it wasn't related to this PR. I'll just review it without the fix. I'll try it by creating a custom suite as you suggested.

Just a thought, it would be nice to be able to create custom suites with the test case names as opposed to numbers since those change as new tests are added.

xylar commented 1 year ago

@sbrus89, that's a good idea and easy to implement. I'll make a PR shortly.

sbrus89 commented 1 year ago

@xylar, do you have any idea why I'd be seeing this:

Exception raised while running the steps of the task
Traceback (most recent call last):
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 326, in _log_and_run_task
    baselines_passed = _run_task(task, available_resources)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 407, in _run_task
    _run_step(task, step, task.new_step_log_file,
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 492, in _run_step
    step.runtime_setup()
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/model_step.py", line 373, in runtime_setup
    self._process_streams(quiet=quiet, remove_unrequested=False)
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/model_step.py", line 610, in _process_streams
    new_tree = yaml_to_mpas_streams(processed_registry_filename,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/yaml.py", line 224, in yaml_to_mpas_streams
    with open(processed_registry_filename, 'r') as reg_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/e3sm_submodules/E3SM-Project/components/mpas-ocean/src/Registry_processed.xml'

It seems like something is going on with creating the path to the Registry. Part of the path to the worktree is being pre-pended to the full correct path: /global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/e3sm_submodules/E3SM-Project/components/mpas-ocean/src/Registry_processed.xml

xylar commented 1 year ago

@sbrus89, hmm, that looks like a bug where a relative path didn't get turned into an absolute path as it should have before it got stored in the pickle file. I have never seen that before, so I would need to know exactly what your polaris setup or polaris suite command was that led to this. Maybe it was a simple as not supplying -p, which I basically always do?

sbrus89 commented 1 year ago

Sounds like it could be from not supplying -p I don't typically do this unless I want an ocean_model outside the repo.

xylar commented 1 year ago

I nearly always use -p ../main/e3sm_submodules/E3SM-Project/components/mpas-ocean so I don't have to rebuild MPAS-Ocean in each worktree I use.

xylar commented 1 year ago

@sbrus89, I'm not able to reproduce this with main even without the -p flag. I doubt I introduced this issue in this PR. Could you give me the exact polaris setup or polaris suite command?

xylar commented 1 year ago

Oh, I take that back! I see it only in some tests and not others:

0:00:08 PASS ocean/planar/baroclinic_channel/10km/threads
0:00:03 PASS ocean/planar/baroclinic_channel/10km/decomp
0:00:00 FAIL ocean/planar/baroclinic_channel/10km/restart
0:00:01 FAIL ocean/planar/inertial_gravity_wave
0:00:02 PASS ocean/single_column/cvmix
0:00:02 PASS ocean/single_column/ideal_age

Again, not related to this PR but an important bug to fix.

xylar commented 1 year ago

@sbrus89, I see what changed and why. It's complicated to explain (to do with fancy config parsing and automatically converting to absolute paths) but probably easy to fix. Thanks for bringing this to my attention. Can you proceed with using the -p flag for now?

sbrus89 commented 1 year ago

@xylar - yes, will do. Sorry for the headache on this!

xylar commented 1 year ago

Thanks so much, @sbrus89! And I really do appreciate you finding those issues, whether they are related or not.

xylar commented 1 year ago

@cbegeman, thank you very much for getting this test case started with excellent documentation and for thorough review! @lconlon, thank you for helping me get this test case started during the hackathon!