E3SM-Project / polaris

Testing and analysis for OMEGA, MPAS-Ocean, MALI and MPAS-Seaice
BSD 3-Clause "New" or "Revised" License
6 stars 13 forks source link

Port sphere transport test group for ocean components #104

Closed cbegeman closed 1 year ago

cbegeman commented 1 year ago

This PR ports the sphere_transport test group from compass.

Checklist

cbegeman commented 1 year ago

Testing

All new test cases have been run on compy with intel, impi.

Tests have been rerun on perlmutter with gnu, mpich.

xylar commented 1 year ago

I really support the idea of using common steps for the 4 if possible. Let me know how that goes and if I can help.

cbegeman commented 1 year ago

The fields differ for each test case, but each test case has the top and middle plots:

image image triplots

cbegeman commented 1 year ago

@xylar I am using common steps. I just realized on opening this pr that I forgot to git remove the other files. It should be more clear now.

cbegeman commented 1 year ago

The order of convergence was reduced for the rotation_2d case. I believe the initial state implementation is without error based on the visualization of the solution, but it is worth careful review:

compass (MPAS-Ocean init mode): image

polaris: image

cbegeman commented 1 year ago

@xylar I'm marking this in progress because I haven't gotten around to the docs or doc strings. However, since I'm out until Aug 9 and this PR is quite large, I'm be happy to get feedback on what I've done so far. I did a lot of clean-up relative to the compass version, and it still feels like quite a lot more clean-up could happen (particularly in the common subroutines). I'm getting a bit of clean-up fatigue so I'd also welcome help with this if you feel up for it!

xylar commented 1 year ago

I have 2 commits with some pretty significant reorganization on my version of this branch. If you're good with them, you could cherry-pick them: https://github.com/xylar/polaris/commits/ocn-port-sphere-transport

The caveat is that I haven't had a chance to test them yet. I will hopefully manage that before you come back.

cbegeman commented 1 year ago

@xylar Thanks for making those changes. I skimmed and the overall idea looks great. Do you want to test out those changes when you have a chance and then push those commits to my fork? You should have permission. I won't have time before I leave. If you don't get to it, I'll test when I get back next Wednesday.

xylar commented 1 year ago

I believe I have fixed the rotation 2d test: convergence

There were a couple of issues still with flow_rotation() but the primary problem was that none of the velocities were getting multiplied by the Earth radius, so they were all orders of magnitude too small. That's true for all 4 tests, not just rotation_2d. I'll verify that all tests are okay after the fix but I compared normalVelocity in initial_state.nc for rotation_2d with the compass version and they look visually identical.

xylar commented 1 year ago

@cbegeman, with my clean up and bug fixes, all 8 test cases seem to be passing on Chrysalis. All 4 initial normal velocity fields look visually identical to their compass counterparts in Paraview.

So I think this is just waiting on a small amount of additional code clean-up (see above) and the docstrings (mostly done) and documentation. Should be very close to merging when you come back.

cbegeman commented 1 year ago

@xylar Thanks for catching the normal velocity scaling error!

Before I write up the docs, maybe we should chat about whether it makes sense to combine these tests with the global_convergence test group. I had intended to do this eventually, but it's feeling potentially worthwhile to do it "now" after rebasing onto https://github.com/E3SM-Project/polaris/pull/108. The cosine bell test is really just another combination of tracer distribution and flow type as I understand it.

xylar commented 1 year ago

Yes, I totally agree that they should be combined. Let's chat about that tomorrow.

xylar commented 1 year ago

@cbegeman, any chance you can also make use of #119 in this? Is there other shared infrastructure you can imagine putting there instead of in an individual test case? Are there changes you'd like to make there to better accommodate this work?

cbegeman commented 1 year ago

@cbegeman, any chance you can also make use of https://github.com/E3SM-Project/polaris/pull/119 in this? Is there other shared infrastructure you can imagine putting there instead of in an individual test case? Are there changes you'd like to make there to better accommodate this work?

Yes, certainly! I can't think of anything else off the top of my head besides the analysis step. If you want to wait a day or two, I could rebase this PR on #119 and see if anything comes up. If not, you can proceed with merging #119 (when I finish my review).

cbegeman commented 1 year ago

@xylar You can wait to review this until after we get https://github.com/E3SM-Project/polaris/pull/126 and https://github.com/E3SM-Project/polaris/pull/127 in. Your feedback on those could affect this PR (though changes will probably be minor).

cbegeman commented 1 year ago

We have new viz!

The triplots now have more labels (with one additional resolution for icos shown here): image

Formerly, the filament preservation diagnostic was computed but not plotted. This figure can be directly compared to Lauritzen et al. Figure 6: image

I split up the multipanel "solution" figures into one figure for each panel. This allows us to use existing spherical viz functionality easily. For example: tracer3_diff

Oh, and we also have new convergence plots, but that is covered by the shared spherical convergence analysis step: convergence_tracer1

cbegeman commented 1 year ago

@xylar This is ready for review. I based it on https://github.com/E3SM-Project/polaris/pull/131. I did notice that the QU version of this plot is cramped, but I don't think it's a big enough issue to hold up review. image

xylar commented 1 year ago

@cbegeman, wonderful! I'll take a look while you're away. I agree that we should fix the plot above but there's no need for that to hold up my review.

xylar commented 1 year ago

I'm getting an error related to step.config.filepath being None when analysis runs the second time in a suite after having failed once the first time. This is happening for icos/nonconvergent_2d and possibly others. I haven't had any luck tracking it down so far but I'll keep working on it tomorrow...

xylar commented 1 year ago

In the meantime, a rebase would help make this review a little easier.

cbegeman commented 1 year ago

@xylar Thanks for starting the review. I haven't run that particular case before so that's interesting. I rebased but didn't retest. It looks like only #131 went in, so hopefully nothing is broken.

xylar commented 1 year ago

I wonder if a few changes got lost in the rebase (e.g. if the branch you rebased from wasn't the most up-to-date version). I didn't see some of the problems we're now seeing now when I did the rebase myself in: https://github.com/xylar/polaris/tree/ocn-port-sphere-transport

cbegeman commented 1 year ago

I wonder if a few changes got lost in the rebase (e.g. if the branch you rebased from wasn't the most up-to-date version). I didn't see some of the problems we're now seeing now when I did the rebase myself in: https://github.com/xylar/polaris/tree/ocn-port-sphere-transport

Yes, that's exactly right. I'm not sure why that happened. I'll try to recover those changes and ping you again.

xylar commented 1 year ago

My branch might help with the recovery. And it's no rush. I'll call it a night.

xylar commented 1 year ago

@cbegeman, the issue I was seeing in https://github.com/E3SM-Project/polaris/pull/104#issuecomment-1765121154 wasn't related to this PR except that this PR happened to trigger the issue. I fixed it in #138 and am no longer seeing it in my testing when I did a test merge of this branch with main.

When I run all 16 tasks in a suite, I'm seeing that the mapping and viz steps slow down substantially over the course of the suite. I have seen behavior like this with cosine bell but it never got as bad because there are simply fewer tasks. I have absolutely no idea what this is caused by, maybe some kind of a memory issue, but I hope that switching to uxarray and holoviews will circumvent it.

Anyways, I'm running now with only the taks without viz, since there doesn't seem to be any trouble with the tasks with viz other than that they run more and more slowly as the suite goes on. If all goes well, I'll fix the one little issue I pointed out above in https://github.com/E3SM-Project/polaris/pull/104#discussion_r1360695844 and approve.

xylar commented 1 year ago

@cbegeman, thanks for the tremendous effort on this one! Feel free to merge as soon as CI passes and you're happy with the state of the branch.

cbegeman commented 1 year ago

Exciting! I'm seeing a failure due to too-low convergence in the icos/nondivergent_2d test. I assume that's expected, though it's interesting that that implies the icos is converging more slowly than the qu since they use the same threshold, right?

@xylar Can you post the convergence rate you're getting for icos/nondivergent_2d? I tried to set the convergence rate to the lowest of the two convergence rates so that all tests pass, so I'll just update the threshold.

xylar commented 1 year ago

I'm seeing:

Error: order of convergence for tracer1
  1.566 < min tolerance 1.6
cbegeman commented 1 year ago

@xylar Thanks for your review! I fixed the convergence rate for that one case and merged.