Slow computation time and CPU 100% usage across all workers

georgeboldeanu commented 6 months ago

Hello! After I compute the interferograms of some pairs (793 pairs across 3 years on Sentinel 1 data) and compute the unwrapping_snaphu i stumble across some strange issue. The issue is when I try to save to disk the unwrapped phase. More exactly, with no matter the dimension of the resulted stack (the same erorr happens for one with the shape 793x10000x1000 or 793x960x960) the CPU goes to 100%. All the workers go into this phase, with 100% allocation.

To Reproduce, it really doesnt matter the size of the dataset the same behaviour occurs. The following code creates the above mentioned beheaviour: unwrap_sbas = sbas.sync_cube(unwrap_sbas, 'unwrap_sbas')

When i manually set the numbers of workers with:

dask.config.set(scheduler='threads', num_workers=12):
          unwrap_sbas = sbas.sync_cube(unwrap_sbas, 'unwrap_sbas')

the process runs, and unwrap_sbas file slowly increments in size, sign that the process runs correctly but slower due to the limited number of workers. Is there any know issue of this behavior?

Screenshots 1.CPU usage at full shared image22

System and software version:

Server Specifications: 125 GB RAM, CPU: Intel(R) Xeon(R) Silver 4309Y CPU @ 2.80GHz
- OS: Ubuntu 22.04.2 LTS (GNU/Linux 6.5.0-28-generic x86_64)
- PyGMTSAR version: 2024.4.17.post2

AlexeyPechnikov commented 6 months ago

But what is your problem? The code works and utilizes all your CPU cores as expected. At the same time, your RAM is not overused. SNAPHU unwrapping requires a lot of resources and can last for hours or even days.

georgeboldeanu commented 6 months ago

Hello @AlexeyPechnikov! I came back to this issue, after several tests the RAM memory is getting oversused, with full usage. Even if I properly set my dask client with all the necessary requirements (n_workers, n_threads, memory limitations and so on). I really dunno if it's something related to dask or to pyGMTSAR implementation. Do you have some knowledge about this issue?

AlexeyPechnikov commented 6 months ago

Unwrapping requires external SNAPHU binary calls where resources usage cannot be accounted in Python. You can apply tiled unwrapping for large grids Iike mentioned above 793 grids 10000x1000, see PyGMTSAR example notebooks for the details. And should be no problems for 793 relatively small grids like 960x960.

georgeboldeanu commented 6 months ago

Snaphu correct conf file resolved this issue!

AlexeyPechnikov / pygmtsar

Slow computation time and CPU 100% usage across all workers #132