NCAR / MOM6

NCAR/CESM fork of the Modular Ocean Model v.6 (MOM6)
Other
2 stars 18 forks source link

Automated Runtime Land Block Elimination #263

Closed alperaltuntas closed 6 months ago

alperaltuntas commented 8 months ago

This PR introduces two enhancements:

  1. Automatic Mask Table Generation: This feature allows for the elimination of land blocks by automatically generating a mask_table during domain initialization at runtime. It eliminates the need for any preprocessing steps and tools, simplifying the user experience. It also ensures that the initial PE count set by the user is fully utilized, thereby eliminating the need for a clean re-build in CESM. Users can activate this option by setting the AUTO_MASKTABLE parameter to True. Relevant commits:

  2. Land block elimination support in the NUOPC cap. Relevant commits:

More on how automated land block elimination works:

This entire iteration is quite fast, taking less than 0.1 seconds for our 0.66-degree workhorse grid.

Performance results:

When auto land block elimination is turned on, we get 20 to 23% speed up. Below table summarizes the model throughput (simulated years per day) for 3-month long CMOM_JRA.TL319_t232 runs on derecho.intel with various target PE numbers.

MOM6 PEs throughput (base) throughput (auto LBE on)
640 18.89 24.29
896 25.75 32.11
1152 31.18 40.01
1280 34.62 42.15
1408 37.19 46.81

896: pe896

1408: pe1408

Potential to-do items:

Testing

Ongoing. No answer changes and no issues so far.

codecov-commenter commented 8 months ago

Codecov Report

Attention: 117 lines in your changes are missing coverage. Please review.

Comparison is base (d363034) 37.90% compared to head (ef3e5a6) 37.85%. Report is 1 commits behind head on dev/ncar.

:exclamation: Current head ef3e5a6 differs from pull request most recent head 05cd9b9. Consider uploading reports for the commit 05cd9b9 to get more accurate results

Files Patch % Lines
src/framework/MOM_domains.F90 9.60% 108 Missing and 5 partials :warning:
config_src/infra/FMS1/MOM_domain_infra.F90 0.00% 3 Missing :warning:
src/ocean_data_assim/MOM_oda_driver.F90 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev/ncar #263 +/- ## ============================================ - Coverage 37.90% 37.85% -0.06% ============================================ Files 269 269 Lines 77176 77302 +126 Branches 14170 14194 +24 ============================================ + Hits 29255 29263 +8 - Misses 42641 42754 +113 - Partials 5280 5285 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alperaltuntas commented 7 months ago

This PR is fully tested and ready to be merged, but if @marshallward or others have comments, particularly regarding the to-do items listed in the PR description, please let us know.

marshallward commented 7 months ago

Thanks for this @alperaltuntas, this looks potentially very useful in contexts where one is not constraints to a particular layout.

I added some specific comments in the code, and have some general thoughts below.

Unfortunately I am short on time right now, but these are my thoughts for the moment. Feel free to act on them as you wish :smile:.

Also, this has absolutely no bearing on the PR, but I like using ib/ie (as ibegin/end) in place of is/ie (as istart/end). Using is as a variable drives me crazy!

alperaltuntas commented 7 months ago

Thanks, @marshallward!

I couldn't locate your inline comments, but here are my quick responses to your bulletpoints above.

am guessing that this ignores LAYOUT and IO_LAYOUT if the AUTO_MASKTABLE is turned on, meaning that it would grab npes from the launcher (mpirun, srun, etc.) What happens if LAYOUT is set and there's a conflict? Should there be an error? I would discourage a WARNING, since they often get ignored and lost in the stdout bloat.

Right, LAYOUT and IO_LAYOUT are ignored when AUTO_MASKTABLE is on, in which case npes is grabbed via MOM_coms_infra:: num_PEs. And, LAYOUT is auto-determined at runtime to maximize the number of eliminated land blocks. Similarly, IO_LAYOUT is determined at runtime if AUTO_IO_LAYOUT_FAC parameter is specified. Otherwise it's set to 1,1. It's a good idea to throw an error when there is a discrepancy. Will do.

It appears to me that gen_auto_mask_table would be run on every re-submission, even though the result should not change. This could potentially be stored in the output, right? (Maybe this is what you meant by memoization.) If I am wrong about this and it is stored and reused, then please disregard.

Right, gen_auto_mask_table is run on every re-submission if AUTO_MASKTABLE remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.

On that note, could the static MOM_mask_table and generated MOM_auto_mask_table be merged into some common file (or "format")? Seems like the information ought to be similar.

Indeed, MOM_auto_mask_table has the same format as MOM_mask_table. And, from FMS's point of view, there is no difference between the two.

I don't think there's any problem with keeping these subroutines in the MOM_domains module. Moving to MOM_domains_infra would seem incorrect, since there are no framework-dependent operations here. And if there's a way to move them without circular dependencies, we can do it in the future.

Sounds good!

The 23% speedup: Does it refer to fewer CPU cycles? Or an actual reduction of runtime?

It refers to the reduction of runtime when compared to a run with no (static or automated) masking. So, it just shows the impact of masking on the wallclock runtime.

In other words, if you had defined the layout and preprocessed this land mask, would you have gotten the same result?

Correct.

See the inline comments on dimensionality

Unfortunately, I can't see your inline comments for some reason.

Using is as a variable drives me crazy!

I agree!

marshallward commented 7 months ago

Sorry about that, I thought my overall comment was in the report. I've just submitted it.

Right, gen_auto_mask_table is run on every re-submission if AUTO_MASKTABLE remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.

If the runtime is this small, then perhaps it's not too important. (I believe I misinterpreted a comment somewhere.)

It might be meaningful to not enable the automasking if a MOM_mask_table is present and enabled in MOM_input. I don't have strong feelings about this, but maybe others do.

Aside from the inline comments (which have now been submitted), I think this looks good.

alperaltuntas commented 6 months ago

@gustavo-marques This PR is ready to be reviewed and merged. I believe @marshallward is working on a fix for the failing macOS tests, so I suppose we can ignore those CI failures for now.