Closed alperaltuntas closed 6 months ago
Attention: 117 lines
in your changes are missing coverage. Please review.
Comparison is base (
d363034
) 37.90% compared to head (ef3e5a6
) 37.85%. Report is 1 commits behind head on dev/ncar.:exclamation: Current head ef3e5a6 differs from pull request most recent head 05cd9b9. Consider uploading reports for the commit 05cd9b9 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR is fully tested and ready to be merged, but if @marshallward or others have comments, particularly regarding the to-do items listed in the PR description, please let us know.
Thanks for this @alperaltuntas, this looks potentially very useful in contexts where one is not constraints to a particular layout.
I added some specific comments in the code, and have some general thoughts below.
I am guessing that this ignores LAYOUT
and IO_LAYOUT
if the AUTO_MASKTABLE
is turned on, meaning that it would grab npes
from the launcher (mpirun
, srun
, etc.) What happens if LAYOUT
is set and there's a conflict? Should there be an error? I would discourage a WARNING
, since they often get ignored and lost in the stdout bloat.
Note that LAYOUT
can often be chosen for architectural reasons and might interfere with the ability to use the auto masktable.
It appears to me that gen_auto_mask_table
would be run on every re-submission, even though the result should not change. This could potentially be stored in the output, right? (Maybe this is what you meant by memoization.) If I am wrong about this and it is stored and reused, then please disregard.
On that note, could the static MOM_mask_table
and generated MOM_auto_mask_table
be merged into some common file (or "format")? Seems like the information ought to be similar.
I don't think there's any problem with keeping these subroutines in the MOM_domains
module. Moving to MOM_domains_infra
would seem incorrect, since there are no framework-dependent operations here. And if there's a way to move them without circular dependencies, we can do it in the future.
No thoughts yet on converting ibuf
, r_extreme
, etc into parameters, expectially if you feel the current values are already well-tuned. If it does ever happen, I'd probably recommend storing them in a parameter block to avoid bloat to MOM_parameter_doc.all
.
The 23% speedup: Does it refer to fewer CPU cycles? Or an actual reduction of runtime? In other words, if you had defined the layout and preprocessed this land mask, would you have gotten the same result?
See the inline comments on dimensionality, I have a feeling you will need to address this at some point.
Unfortunately I am short on time right now, but these are my thoughts for the moment. Feel free to act on them as you wish :smile:.
Also, this has absolutely no bearing on the PR, but I like using ib/ie
(as ibegin/end
) in place of is/ie
(as istart/end
). Using is
as a variable drives me crazy!
Thanks, @marshallward!
I couldn't locate your inline comments, but here are my quick responses to your bulletpoints above.
am guessing that this ignores LAYOUT and IO_LAYOUT if the AUTO_MASKTABLE is turned on, meaning that it would grab npes from the launcher (mpirun, srun, etc.) What happens if LAYOUT is set and there's a conflict? Should there be an error? I would discourage a WARNING, since they often get ignored and lost in the stdout bloat.
Right, LAYOUT
and IO_LAYOUT
are ignored when AUTO_MASKTABLE
is on, in which case npes is grabbed via MOM_coms_infra:: num_PEs
. And, LAYOUT
is auto-determined at runtime to maximize the number of eliminated land blocks. Similarly, IO_LAYOUT
is determined at runtime if AUTO_IO_LAYOUT_FAC
parameter is specified. Otherwise it's set to 1,1
. It's a good idea to throw an error when there is a discrepancy. Will do.
It appears to me
that gen_auto_mask_table
would be run on every re-submission, even though the result should not change. This could potentially be stored in the output, right? (Maybe this is what you meant by memoization.) If I am wrong about this and it is stored and reused, then please disregard.
Right, gen_auto_mask_table
is run on every re-submission if AUTO_MASKTABLE
remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.
On that note, could the static MOM_mask_table and generated MOM_auto_mask_table be merged into some common file (or "format")? Seems like the information ought to be similar.
Indeed, MOM_auto_mask_table
has the same format as MOM_mask_table
. And, from FMS's point of view, there is no difference between the two.
I don't think there's any problem with keeping these subroutines in the MOM_domains module. Moving to MOM_domains_infra would seem incorrect, since there are no framework-dependent operations here. And if there's a way to move them without circular dependencies, we can do it in the future.
Sounds good!
The 23% speedup: Does it refer to fewer CPU cycles? Or an actual reduction of runtime?
It refers to the reduction of runtime when compared to a run with no (static or automated) masking. So, it just shows the impact of masking on the wallclock runtime.
In other words, if you had defined the layout and preprocessed this land mask, would you have gotten the same result?
Correct.
See the inline comments on dimensionality
Unfortunately, I can't see your inline comments for some reason.
Using is as a variable drives me crazy!
I agree!
Sorry about that, I thought my overall comment was in the report. I've just submitted it.
Right, gen_auto_mask_table is run on every re-submission if AUTO_MASKTABLE remains True. I thought it would be fine to do so because it only takes around ~0.1 sec of runtime, and doing so would prevent the usage of outdated mask tables when the user changes things like the PE count, the topography, or minimum depth.
If the runtime is this small, then perhaps it's not too important. (I believe I misinterpreted a comment somewhere.)
It might be meaningful to not enable the automasking if a MOM_mask_table
is present and enabled in MOM_input
. I don't have strong feelings about this, but maybe others do.
Aside from the inline comments (which have now been submitted), I think this looks good.
@gustavo-marques This PR is ready to be reviewed and merged. I believe @marshallward is working on a fix for the failing macOS tests, so I suppose we can ignore those CI failures for now.
This PR introduces two enhancements:
Automatic Mask Table Generation: This feature allows for the elimination of land blocks by automatically generating a mask_table during domain initialization at runtime. It eliminates the need for any preprocessing steps and tools, simplifying the user experience. It also ensures that the initial PE count set by the user is fully utilized, thereby eliminating the need for a clean re-build in CESM. Users can activate this option by setting the
AUTO_MASKTABLE
parameter to True. Relevant commits:Land block elimination support in the NUOPC cap. Relevant commits:
More on how automated land block elimination works:
npes
(set by the user), calculate an upper limit on the potential number of new domain divisions:npes/glob_ocn_frac
, where glob_ocn_frac is the ratio of total ocean cells to total cells. While this upper limit may be overly optimistic for realistic domains, it can be achievable for idealistic domains.p
to determine the layout and identify the maximum number of blocks that can be eliminated to meet the targetnpes
. Once this condition is met, finalize the iteration and adopt the determinedp
as the new number of divisions (wherep
-npes
correspond to the number of masked blocks.)This entire iteration is quite fast, taking less than 0.1 seconds for our 0.66-degree workhorse grid.
Performance results:
When auto land block elimination is turned on, we get 20 to 23% speed up. Below table summarizes the model throughput (simulated years per day) for 3-month long CMOM_JRA.TL319_t232 runs on derecho.intel with various target PE numbers.
896:![pe896](https://github.com/NCAR/MOM6/assets/12533965/3c001cfa-a06e-4313-92eb-6e77adca71d6)
1408:![pe1408](https://github.com/NCAR/MOM6/assets/12533965/d099eaa5-d653-4c3a-abeb-942361b5ead2)
Potential to-do items:
MOM _domains
module to a new module, sayMOM_auto_mask_table
. Note that this would necessitate moving theMOM_define_layout
subroutine to somewhere else (MOM_domains_infra
?) to prevent circular module dependency.ibuf
,r_extreme
, andauto_mask_table_fname
) to runtime parameters (though, if possible, I'd prefer to keep them as hard-coded parameters to keep the UX simple.)Testing
Ongoing. No answer changes and no issues so far.