fraserwg / pvcalc

Other
0 stars 0 forks source link

dask based parallelism #5

Open fraserwg opened 4 years ago

fraserwg commented 4 years ago

I'm currently using thread and processed based parallelism from the python standard library. This requires user input for parallel computation. As I'm using Xarray the natural choice for parallelising the code would be Dask, with each chunk corresponding to a tile.

It would be logical to use dask all the way through - perhaps starting with a mfdataset instead of a list of datasets and performing operation on them one by one. That said, I found that on archer, the process of opening a large mfdataset takes a <> time. Perhaps it would make sense to introduce the Dask based features after the level selection step. I.e. use open_tile on each tile and THEN merge the datasets and use Dask.

Alternatively, if using Dask causes problems, perhaps job lib would be a better, more user friendly way to improve the way the parallelism works.