Klimaatlas / KAPy

Klimaatlases in Python
MIT License
4 stars 6 forks source link

Consider support for dask and lazy loading #90

Open markpayneatwork opened 6 months ago

markpayneatwork commented 6 months ago

Is your feature request related to a problem? Please describe. KAPy is built upon xarray and xarray supports dask for parallel reading, processing and writing of data. It's a great tool and potentiallycan give substantial speed improvements

Describe the solution you'd like Provide support for dask in KAPy, either by default or via a switch

Describe alternatives you've considered There are some issues to consider

markpayneatwork commented 6 months ago

This commit https://github.com/Klimaatlas/KAPy/commit/d6380d6c95b1fa17ddce5461f2c3f427d8590464 restructures the import functionality to avoid xarray automatically using dask. At this point, KAPy is therefore an in-memory processing tool. We may need to fix this in the future.

markpayneatwork commented 6 months ago

dask and lazy loading are also very closely coupled - for some datasets, it may be advantageous not to write out the intermediate file, but instead just return the xarray object. How this works best with cdo sellonlatbox subsetting is unclear, but the two things need to be thought together at the same time

markpayneatwork commented 1 month ago

Saving pickles is the key first step and is now working as of a301785. This achieves 90% of the desired functionality. The remaining 10% will take 90% of the work, and can be handled at a later time :-)