Closed xylar closed 1 year ago
This is based off of #119 and will be rebased after that gets merged.
Thanks to @lconlon for starting out this test case with me in the Polaris Hackathon in May!
I have run all 4 geostrophic test cases on Chrysalis with Intel and OpenMPI. They ran successfully and produced the expected plots.
The order of convergence for water-column thickness for the QU grids is really terrible -- 0.4. So that's discoraging for MPAS-Ocean but not an indication of a problem with the tests. The Icos meshes do much better -- about 1.6.
QU:
Icos:
Here are some plots of water-column thickness, u, v and normal velocity (for the icos 240km mesh):
Here are diff plots of water-column thickness for the QU and Icos meshes with increasing resolution.
I need to update this to use #126, so it will likely make sense to wait on reviewing this until after that PR has gone in.
@cbegeman, just an update that this is coming along and now uses the shared analysis, but I need to test it.
@xylar Just an FYI that I'll be posting a PR soon with some additional clean-up of the convergence analysis step. We might want to get this reviewed and merged first so that you don't have to do another rebase.
@cbegeman and @sbrus89, this is ready for review (again)!
@cbegeman, thanks for catching those mistakes, which are now fixed. Let me know if there are other changes you would like me to make.
@cbegeman, I'd prefer to wait until @sbrus89 has time to look at this. That may very well mean that I will rebase onto #131 once that's done. That's fine, since either I will have to rebase this or you will have to rebase that and I don't mind the extra work.
@xylar, I will get to this on Monday. Thanks for your patience.
@sbrus89, that would be great!
@sbrus89, thanks very much for your review! I think I've taken care of your suggestions. As I wrote above, I'd prefer to stick with the default resolutions rather than define them in geostrophic.cfg
@xylar - I'm testing a local merge and am seeing this from polaris list
:
Traceback (most recent call last):
File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 33, in <module>
sys.exit(load_entry_point('polaris', 'console_scripts', 'polaris')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 25, in importlib_load_entry_point
return next(matches).load()
^^^^^^^^^^^^^^^^^^^^
File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
module = import_module(match.group('module'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lcrc/group/acme/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/__main__.py", line 8, in <module>
from polaris import cache, list, setup, suite
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/list.py", line 6, in <module>
from polaris.components import get_components
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/components.py", line 5, in <module>
from polaris.ocean import Ocean
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/__init__.py", line 4, in <module>
from polaris.ocean.tasks.geostrophic import add_geostrophic_tasks
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/tasks/geostrophic/__init__.py", line 6, in <module>
from polaris.ocean.tasks.geostrophic.analysis import Analysis
File "/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/tasks/geostrophic/analysis.py", line 3, in <module>
from polaris.ocean.convergence.spherical import SphericalConvergenceAnalysis
ImportError: cannot import name 'SphericalConvergenceAnalysis' from 'polaris.ocean.convergence.spherical' (/gpfs/fs1/home/sbrus/polaris_worktrees/williamson/polaris/ocean/convergence/spherical/__init__.py)
@xylar Maybe it's worth rebasing onto https://github.com/E3SM-Project/polaris/pull/131 in the process of addressing @sbrus89's issue? Sorry for the hassle.
@sbrus89, I'm not able to reproduce that issue locally. Could you try recreating your polaris conda environment to see if that helps?
Update: after rebasing, I see the issue. Now I get it, you did a local test merge with main
?
But, yes, I'll rebase tomorrow and you can also try again after that. It might be best to wait.
@sbrus89, sorry for the confusion on my part yesterday. I was too tired I guess and I missed that this was with a test merge. I have rebased and fixed thing up. Retesting now but I think you can give it another try when you have time.
@xylar, no problem. Yes I was testing with a local merge with main
. I'll give it another go today. Thanks for rebasing!
I successfully tested the convergence
suite with the geostrophic
tests added:
Task Runtimes:
0:01:01 PASS ocean/planar/inertial_gravity_wave
0:00:55 FAIL ocean/planar/manufactured_solution
0:05:27 PASS ocean/spherical/icos/cosine_bell
0:09:15 PASS ocean/spherical/qu/cosine_bell
0:01:35 PASS ocean/spherical/icos/geostrophic
0:03:02 PASS ocean/spherical/qu/geostrophic
Total runtime: 0:21:16
I was having some performance problems with the geostrphic/with_viz
tests that seem random but also persistent. I don't know if it's a Chrysalis problem, an issue wiht ESMF or something else. But I also hope we can replace this viz soon with something better so I'm not inclined to investigate too much.
@xylar, I'm seeing this error:
ocean/spherical/icos/geostrophic
Traceback (most recent call last):
File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/bin/polaris", line 33, in <module>
sys.exit(load_entry_point('polaris', 'console_scripts', 'polaris')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/__main__.py", line 62, in main
commands[args.command]()
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 196, in main
run_tasks(suite_name='task', quiet=args.quiet, is_task=True,
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 103, in run_tasks
result_str, success, task_time = _log_and_run_task(
^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 316, in _log_and_run_task
task.steps_to_run = _update_steps_to_run(
^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 222, in _update_steps_to_run
step_str = config.get(task_name, 'steps_to_run').replace(',', ' ')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/site-packages/mpas_tools/config.py", line 115, in get
return self.combined.get(section, option)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/configparser.py", line 797, in get
d = self._unify_values(section, vars)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/mambaforge/envs/dev_polaris_0.2.0/lib/python3.11/configparser.py", line 1168, in _unify_values
raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'icos_geostrophic'
I looked in the geostrophic.cfg
file and only see a icos_geostrophic_with_viz
section. I have a feeling this is because I set up the cases in this order: ocean/spherical/icos/geostrophic
, ocean/spherical/icos/geostrophic/with_viz
before running ocean/spherical/icos/geostrophic
. Is this expected behavior?
@sbrus89, that does sound like a bug. Can you send me the commands you used to set up the 2 tasks? I want to make sure I can reproduce it. I always have been setting up both tasks together.
@xylar - I did the following:
polaris setup -t ocean/spherical/icos/geostrophic -w $WORKROOT/wiliamson
polaris setup -t ocean/spherical/icos/geostrophic/with_viz -w $WORKROOT/wiliamson
polaris setup -t ocean/spherical/qu/geostrophic -w $WORKROOT/wiliamson
polaris setup -t ocean/spherical/qu/geostrophic/with_viz -w w $WORKROOT/wiliamson
and then
cd $WORKROOT/wiliamson/ocean/spherical/icos/geostrophic
sbatch job_script.sh
@sbrus89, great, thanks! I'm looking into this now.
Just as a tip to make your workflow more efficient, you may want to do:
polaris list
and then:
polaris setup -n 17 18 19 20 -w $WORKROOT/wiliamson
This will set up a custom suite with all 4 tasks. But it doesn't seem to expose the problem you showed so I'm glad you did things the "more tedious" way.
@sbrus89, I understand now what is going wrong and I don't have a good idea what to do about it yet. It isn't anything to do with this PR, it was caused by #125. When you separately set up 2 tasks that share the same config file, the first task will create a section, option and value with its steps_to_run
. Then, the second task will overwrite the config file, including its section, option and value with its steps_to_run
. If you were to set them up together, both sections, options and values would be populated as excepted:
[icos_geostrophic]
# A list of steps to include when running the icos_geostrophic task
# source: /gpfs/fs1/home/ac.xylar/e3sm_work/polaris/add-geostrophic-tests/polaris/setup.py
steps_to_run = icos_base_mesh_60km icos_init_60km icos_forward_60km icos_base_mesh_120km icos_init_120km icos_forward_120km icos_base_mesh_240km icos_init_240km icos_forward_240km icos_base_mesh_480km icos_init_480km icos_forward_480km analysis
[icos_geostrophic_with_viz]
# A list of steps to include when running the icos_geostrophic_with_viz task
# source: /gpfs/fs1/home/ac.xylar/e3sm_work/polaris/add-geostrophic-tests/polaris/setup.py
steps_to_run = icos_base_mesh_60km icos_init_60km icos_forward_60km icos_map_cell_60km icos_map_edge_60km icos_viz_60km icos_base_mesh_120km icos_init_120km icos_forward_120km icos_map_cell_120km icos_map_edge_120km icos_viz_120km icos_base_mesh_240km icos_init_240km icos_forward_240km icos_map_cell_240km icos_map_edge_240km icos_viz_240km icos_base_mesh_480km icos_init_480km icos_forward_480km icos_map_cell_480km icos_map_edge_480km icos_viz_480km analysis
Presumably, the solution will be to include steps_to_run
from all tasks when writing out a shared config file. Since this is unrelated to geostrophic
, I'll address it in a separate PR.
@sbrus89, would you like to review #134 and for me to merge that and rebase? Or can you review this without that fix?
@xylar, I figured it wasn't related to this PR. I'll just review it without the fix. I'll try it by creating a custom suite as you suggested.
Just a thought, it would be nice to be able to create custom suites with the test case names as opposed to numbers since those change as new tests are added.
@sbrus89, that's a good idea and easy to implement. I'll make a PR shortly.
@xylar, do you have any idea why I'd be seeing this:
Exception raised while running the steps of the task
Traceback (most recent call last):
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 326, in _log_and_run_task
baselines_passed = _run_task(task, available_resources)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 407, in _run_task
_run_step(task, step, task.new_step_log_file,
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/run/serial.py", line 492, in _run_step
step.runtime_setup()
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/model_step.py", line 373, in runtime_setup
self._process_streams(quiet=quiet, remove_unrequested=False)
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/model_step.py", line 610, in _process_streams
new_tree = yaml_to_mpas_streams(processed_registry_filename,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/polaris/yaml.py", line 224, in yaml_to_mpas_streams
with open(processed_registry_filename, 'r') as reg_file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/e3sm_submodules/E3SM-Project/components/mpas-ocean/src/Registry_processed.xml'
It seems like something is going on with creating the path to the Registry. Part of the path to the worktree is being pre-pended to the full correct path: /global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/global/cfs/cdirs/e3sm/sbrus/polaris_worktrees/williamson/e3sm_submodules/E3SM-Project/components/mpas-ocean/src/Registry_processed.xml
@sbrus89, hmm, that looks like a bug where a relative path didn't get turned into an absolute path as it should have before it got stored in the pickle file. I have never seen that before, so I would need to know exactly what your polaris setup
or polaris suite
command was that led to this. Maybe it was a simple as not supplying -p
, which I basically always do?
Sounds like it could be from not supplying -p
I don't typically do this unless I want an ocean_model
outside the repo.
I nearly always use -p ../main/e3sm_submodules/E3SM-Project/components/mpas-ocean
so I don't have to rebuild MPAS-Ocean in each worktree I use.
@sbrus89, I'm not able to reproduce this with main
even without the -p
flag. I doubt I introduced this issue in this PR. Could you give me the exact polaris setup
or polaris suite
command?
Oh, I take that back! I see it only in some tests and not others:
0:00:08 PASS ocean/planar/baroclinic_channel/10km/threads
0:00:03 PASS ocean/planar/baroclinic_channel/10km/decomp
0:00:00 FAIL ocean/planar/baroclinic_channel/10km/restart
0:00:01 FAIL ocean/planar/inertial_gravity_wave
0:00:02 PASS ocean/single_column/cvmix
0:00:02 PASS ocean/single_column/ideal_age
Again, not related to this PR but an important bug to fix.
@sbrus89, I see what changed and why. It's complicated to explain (to do with fancy config parsing and automatically converting to absolute paths) but probably easy to fix. Thanks for bringing this to my attention. Can you proceed with using the -p
flag for now?
@xylar - yes, will do. Sorry for the headache on this!
Thanks so much, @sbrus89! And I really do appreciate you finding those issues, whether they are related or not.
@cbegeman, thank you very much for getting this test case started with excellent documentation and for thorough review! @lconlon, thank you for helping me get this test case started during the hackathon!
This merge adds the
geostrophic
convergence test case from Williamson et al. 1992. The test case starts out in geostrophic balance and ideally remains in steady state.The test case has similarities to
cosine_bell
so efforts have been made to share code, primarily theforward
step, between the two.Checklist
api.md
) has any new or modified class, method and/or functions listedTesting
comment in the PR documents testing used to verify the changes