Open dougiesquire opened 1 year ago
Good idea, and great news that memory scalability has improved. This has always been the sticking point for me in large calculations, and it would be good to have a recipe providing advice on large calculations.
Dask is now up to v. 2023.1.0. What version are we using in the latest Conda environment?
analysis3-unstable
has 2022.12.1 and analysis
has 2022.11.1. I'll probably set up a couple of envs specific for this task - one with the latest dask, and one pre-2022.11.0
I'd love to hear from COSIMA folk whether there are any particular calculations/workflows that have historically been difficult to complete
These notebooks (I think) were flagged as potential candidates:
Some of the above examples use the cosima_cookbook
function compute_by_blocks
. It may be possible to remove this with the new version of dask.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/cosima-hackathon-v2-0-tuesday-january-24th-2023/307/40
I chose to focus on the Decomposing_kinetic_energy_into_mean_and_transient example, which heavily uses the compute_by_blocks
function. Replacing compute_by_blocks
with regular compute
causes dask workers to start spilling to disk and the notebook dies before completion (using 7 CPU, 32 GB). This is true even for the very simple TKE calculation.
It turns out that the grid of the ocean_daily_3d_u_*.nc
files used in this notebook changes depending on which output
directory you're looking at:
output196
-output279
are on a regional domain with yu_ocean: 900
, xu_ocean: 3600
output740
-output799
are on a global domain with yu_ocean: 2700
, xu_ocean: 3600
Currently, in this notebook, the cosima-cookbook
tries to whack these grids together when loading, which causes big issues for downstream analysis. I'll open a separate issue about this and link it soon (ADDED: link).
If we change the notebook to only load the global data and compute
the TKE:
So, the scheduling update means that this notebook can now be run without using the compute_by_blocks
workaround.
Test environments and a stripped back version of Decomposing_kinetic_energy_into_mean_and_transient.ipynb used for testing are here
@dougiesquire does that mean we can remove compute_by_blocks
from all notebooks? Is this only in Decomposing_kinetic_energy_into_mean_and_transient.ipynb? Any other changes to notebooks we need to make a result of these dask improvements?
I'm not really sure without testing, sorry. In this particular instance compute_by_blocks
was hiding a deeper issue with the data. The new dask scheduler helped, but only once the data issue was resolved. Each case may well be different.
Ok, let's work on this further at Hackathon 4.0 then.
Following @dougiesquire's testing above: compute_by_blocks
can be removed from the Decomposing_kinetic_energy_into_mean_and_transient notebook.
Perhaps more testing can also be done for other notebooks (though I'm not sure any other notebooks contain compute_by_blocks
?).
Dask 2022.11.0 includes changes to the way tasks are scheduled that has improved/made viable many large geoscience workflows. Some details here and here.
It could be interesting/helpful to test how this change has impacted any particularly complex/slow/problematic cosima recipes.