GFDRR / CCDR-tools

Geoanalytics for climate and disaster risk screening
https://gfdrr.github.io/CCDR-tools/
12 stars 8 forks source link

Parallelize work across RPs #12

Closed ConnectedSystems closed 1 year ago

ConnectedSystems commented 1 year ago

This PR includes changes to:

Results have been checked for consistency, but not for correctness (see attached files; the file starting with seq_ is the original results with the sequential loop).

seq_PNG_FL_ADM3_pop_EAI.csv PNG_FL_ADM3_pop_EAI.csv

Changes have reduced memory use from a peak of ~8-9 GB to ~5-6GB per core.

While this is an improvement, obviously it is not enough.

To support larger datasets, we will have to make further changes to support windowed reading/processing, and perhaps explore use of dask which XArray has support for.

@artessen has done the bulk of the work separating out the code in a way to make this a command line tool. I suggest we build on his work and use typer for this, which is simple to use.

For the notebooks, we might want to explore mercury