Open SteffanDavies opened 8 months ago
As I remember, I've shared example notebooks with the recommended solution for this case:
with dask.config.set(scheduler='single-threaded'):
...compute()
And applying delayed() to large rasters has side effects, as it prevents the freeing of used resources as soon as possible. Does your approach work for a system with 4GB RAM and 4 cores? Or for one with 16GB RAM and 16 cores? You can try increasing the processing chunk size to greatly reduce the overhead of Dask computations.
As I remember, I've shared example notebooks with the recommended solution for this case:
with dask.config.set(scheduler='single-threaded'): ...compute()
And applying delayed() to large rasters has side effects, as it prevents the freeing of used resources as soon as possible. Does your approach work for a system with 4GB RAM and 4 cores? Or for one with 16GB RAM and 16 cores? You can try increasing the processing chunk size to greatly reduce the overhead of Dask computations.
I don't have access to hardware with less than 32GB of RAM except for a Raspberry Pi in my drawer. That actually sounds like a fun experiment.
Often, possessing a large amount of relatively slow memory is not advantageous. In my tests, Apple Silicon hosts with 2-4 times less RAM have managed to outperform Intel/AMD hosts. Indeed, processing speed can be significantly increased by eliminating Dask chunking, but at the cost of considerable scalability loss. By the way, on Linux, I've achieved the best results using different memory allocators, as detailed in the Dask documentation.
Additionally, using 1D unwrapping (PSI), there's no need to process a large area all at once. PyGMTSAR's turbulent noise removal is also 1D, and only stratified phase delay requires 2D processing. When not working in mountainous areas, it's relatively straightforward to divide your area. Alternatively, masking out unused pixels can also expedite the process. There are numerous strategies to enhance processing speed.
This is much faster than looping in Python. The current method can take very long (30+ min) for large stacks (4000 pairs), have you experienced this?
Benchmark (DASK 1 Worker 16 Threads 128GB RAM)
SCHEDULER 30 150 (PAIRS)
distributed 1min 34s 7min 33s
multiprocessing 53 s 4min 9s
processes 49.6 s 3min 54s
single-threaded 3.7 s 18.4 s
sync 3.44 s 17.4 s
synchronous 3.68 s 17.4 s
threading 7.67 s 33.2 s
threads 7.73 s 38.3 s