This PR optimizes the online diagnostic calculation and report generation for use on HPC where there are typically many more cores. To achieve this I standardized some of the common aggregation functions for taking means and switched to use of dask distributed, which has more fine-grained control over workers and memory usage. Additionally, the diagnostic functions are batched to occur concurrently using joblib but the individual dask tasks are elevated to the top-level of the scheduler to properly track individual task operations and memory usage.
Refactored public API:
new flags for controlling the compute parallelism to the online prognostic run compute.py
--scheduler-file provide a dask scheduler file for an external dask distributed cluster when doing multi-node processing
--num-concurrent-2d-calcs number of concurrent 2D diagnostics to execute (i.e., diagnostic functions), defaults to 8
--num-concurrent-3d-calcs number of concurrent 3D diagnostics to execute (i.e., diagnostic functions), defaults to 2
Significant internal changes:
Online diagnostics are now calculated using a dask distributed cluster with progress visible via localhost:8787 and parallelized using batches of function calls with joblib
Generated diagnostic plots has been parallelized using joblib
Diurnal cycle now uses flox to speed up the dask groupby operation
load_scream_data now separates to physics-only tendencies for due_to_scream_physics variables, whereas previously, it included both the nudging/ML update
load_scream_data now removes the ML precipitation from PRATEsfc since the diagnostics expects this to be a physics only quantity
Requirement changes:
Bulleted list, if relevant, of any changes to setup.py, requirement.txt, environment.yml, etc
[ ] Ran make lock_deps/lock_pip following these instructions
[ ] Add PR review with license info for any additions to constraints.txt
(example)
This PR optimizes the online diagnostic calculation and report generation for use on HPC where there are typically many more cores. To achieve this I standardized some of the common aggregation functions for taking means and switched to use of dask distributed, which has more fine-grained control over workers and memory usage. Additionally, the diagnostic functions are batched to occur concurrently using
joblib
but the individual dask tasks are elevated to the top-level of the scheduler to properly track individual task operations and memory usage.Refactored public API:
compute.py
--scheduler-file
provide a dask scheduler file for an external dask distributed cluster when doing multi-node processing--num-concurrent-2d-calcs
number of concurrent 2D diagnostics to execute (i.e., diagnostic functions), defaults to 8--num-concurrent-3d-calcs
number of concurrent 3D diagnostics to execute (i.e., diagnostic functions), defaults to 2Significant internal changes:
localhost:8787
and parallelized using batches of function calls withjoblib
joblib
flox
to speed up the dask groupby operationload_scream_data
now separates to physics-only tendencies fordue_to_scream_physics
variables, whereas previously, it included both the nudging/ML updateload_scream_data
now removes the ML precipitation fromPRATEsfc
since the diagnostics expects this to be a physics only quantityRequirement changes:
Bulleted list, if relevant, of any changes to setup.py, requirement.txt, environment.yml, etc
[ ] Ran
make lock_deps/lock_pip
following these instructions[ ] Add PR review with license info for any additions to
constraints.txt
(example)[ ] Tests added
Coverage reports (updated automatically):