CS-SI / eodag

Earth Observation Data Access Gateway
https://eodag.readthedocs.io
Apache License 2.0
324 stars 44 forks source link

Parallel processing for downloading and unziping #146

Open koleckt opened 3 years ago

koleckt commented 3 years ago

We can save a lot of processing time, specially for large dataset of product (time serie) if the downloading and the unziping (exemple of S1 products) can be done in 2 differents threads.

Actually, when downloading, the CPU is not used at full capacity, and when unzip, the network is not use.

Doing both in parallel can speedup a lot the processing codes.

sbrunato commented 3 years ago

Thanks for this suggestion, we will work on a new feature that provide a way to parallelize products downloading, and keep you informed of the progress in this issue

sbrunato commented 3 years ago

120

sbrunato commented 3 years ago

See if there is a way to provide to the downloading method a parameter referencing to a generic pool / worker scheduler, compatible with Dask or other solutions

Ask as parameter an executor like Dask.Client, or concurrent.futures that use the same interfaces. See https://distributed.dask.org/en/latest/client.html

Some serializing might be needed (already implemented in eodag):