ai2cm / fv3net

explore the FV3 data for parameterization
MIT License
16 stars 3 forks source link

Optimize diagnostics for larger HPC nodes #2396

Closed frodre closed 2 months ago

frodre commented 2 months ago

This PR optimizes the online diagnostic calculation and report generation for use on HPC where there are typically many more cores. To achieve this I standardized some of the common aggregation functions for taking means and switched to use of dask distributed, which has more fine-grained control over workers and memory usage. Additionally, the diagnostic functions are batched to occur concurrently using joblib but the individual dask tasks are elevated to the top-level of the scheduler to properly track individual task operations and memory usage.

Refactored public API:

Significant internal changes:

Requirement changes:

Coverage reports (updated automatically):