COSIMA / cosima-recipes

A cookbook of recipes (i.e., examples) for analysing ocean and sea ice model output
https://cosima-recipes.readthedocs.io
Apache License 2.0
46 stars 66 forks source link

Test/document how recent dask improvements affect particularly sticky calculations #191

Open dougiesquire opened 1 year ago

dougiesquire commented 1 year ago

Dask 2022.11.0 includes changes to the way tasks are scheduled that has improved/made viable many large geoscience workflows. Some details here and here.

It could be interesting/helpful to test how this change has impacted any particularly complex/slow/problematic cosima recipes.

aekiss commented 1 year ago

Good idea, and great news that memory scalability has improved. This has always been the sticking point for me in large calculations, and it would be good to have a recipe providing advice on large calculations.

Dask is now up to v. 2023.1.0. What version are we using in the latest Conda environment?

dougiesquire commented 1 year ago

analysis3-unstable has 2022.12.1 and analysis has 2022.11.1. I'll probably set up a couple of envs specific for this task - one with the latest dask, and one pre-2022.11.0

dougiesquire commented 1 year ago

I'd love to hear from COSIMA folk whether there are any particular calculations/workflows that have historically been difficult to complete

dougiesquire commented 1 year ago

These notebooks (I think) were flagged as potential candidates:

Some of the above examples use the cosima_cookbook function compute_by_blocks. It may be possible to remove this with the new version of dask.

access-hive-bot commented 1 year ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-hackathon-v2-0-tuesday-january-24th-2023/307/40

dougiesquire commented 1 year ago

I chose to focus on the Decomposing_kinetic_energy_into_mean_and_transient example, which heavily uses the compute_by_blocks function. Replacing compute_by_blocks with regular compute causes dask workers to start spilling to disk and the notebook dies before completion (using 7 CPU, 32 GB). This is true even for the very simple TKE calculation.

It turns out that the grid of the ocean_daily_3d_u_*.nc files used in this notebook changes depending on which output directory you're looking at:

Currently, in this notebook, the cosima-cookbook tries to whack these grids together when loading, which causes big issues for downstream analysis. I'll open a separate issue about this and link it soon (ADDED: link).

If we change the notebook to only load the global data and compute the TKE:

So, the scheduling update means that this notebook can now be run without using the compute_by_blocks workaround.

dougiesquire commented 1 year ago

Test environments and a stripped back version of Decomposing_kinetic_energy_into_mean_and_transient.ipynb used for testing are here

adele-morrison commented 6 months ago

@dougiesquire does that mean we can remove compute_by_blocks from all notebooks? Is this only in Decomposing_kinetic_energy_into_mean_and_transient.ipynb? Any other changes to notebooks we need to make a result of these dask improvements?

dougiesquire commented 6 months ago

I'm not really sure without testing, sorry. In this particular instance compute_by_blocks was hiding a deeper issue with the data. The new dask scheduler helped, but only once the data issue was resolved. Each case may well be different.

adele-morrison commented 6 months ago

Ok, let's work on this further at Hackathon 4.0 then.

Following @dougiesquire's testing above: compute_by_blocks can be removed from the Decomposing_kinetic_energy_into_mean_and_transient notebook.

Perhaps more testing can also be done for other notebooks (though I'm not sure any other notebooks contain compute_by_blocks?).