mhotalebi commented 5 months ago

I'm experiencing an issue with the s1.download_orbit function in my project. When I attempt to download orbits for more than 30 scenes, I receive the following error:

An error occurred: ('connection aborted.', ConnectionResetError(104, 'connection reset by peer'))

This issue occurs regardless of the network I use, and I have tried multiple different networks to rule out connectivity issues on my end. It seems like the connection is being reset by the peer when handling larger batches. This could potentially be a timeout or resource limit issue on the server side. i also use this code but it didnt work either:

scan the data directory for SLC scenes and download missed orbits

import time max_attempts = 5 # Maximum number of attempts attempt = 0 # Current attempt success = False # Flag to indicate success while attempt < max_attempts and not success: try: attempt += 1 print(f"Attempt {attempt} of {max_attempts}") S1.download_orbits(DATADIR, S1.scan_slc(DATADIR2)) success = True # If download succeeds, set success to True print("Download successful.") except Exception as e: # Catch any exception (consider specifying exact exceptions) print(f"An error occurred: {e}") time.sleep(5) # Wait for 5 seconds before retrying (to avoid hammering the server) if not success: print("Failed to download after maximum attempts.")

AlexeyPechnikov commented 5 months ago

In case of network issues, you can try downloading the orbits sequentially instead of using multiple parallel downloads (8 by default):

S1.download_orbits(..., n_jobs=1)

mhotalebi commented 5 months ago

Thanks for your response.

I also have another problem. My model runs perfectly until it reaches sbas.sync_cube(stl_sbas, 'stl_sbas'). I have waited for more than a day, but it doesn't work, and nothing happens. I also removed this line and run the rest of the code, but it can't calculate this line and nothing happened on it:

zmin, zmax = np.nanquantile(velocity_sbas, [0.01, 0.99]) Sometimes, I see memory warnings that indicate my memory limit is 4, and I'm almost reaching that amount of memory.

I actually changed the Dask cluster configuration in my cloud to the following, but it didn't solve the problem:

Apple Silicon Air Dask initialization for big data

import dask, dask.distributed

Increase timeouts to work on large datasets in Jupyter notebooks

dask.config.set({'distributed.comm.timeouts.tcp': '60s'}) dask.config.set({'distributed.comm.timeouts.connect': '60s'})

Set aggressive memory utilization

dask.config.set({'distributed.worker.memory.target': 0.75}) dask.config.set({'distributed.worker.memory.spill': 0.85}) dask.config.set({'distributed.worker.memory.pause': 0.90}) dask.config.set({'distributed.worker.memory.terminate': 0.98})

Load the most powerful Dask Distributed scheduler

from dask.distributed import Client, LocalCluster

Cleanup to restart the client and cluster for repeatable runs

if 'client' in globals(): client.close() if 'cluster' in globals(): cluster.close()

Apple Silicon big.LITTLE 4+4 cores configuration

cluster = LocalCluster(n_workers=3, threads_per_worker=2) client = Client(cluster) client

mhotalebi commented 5 months ago

cluster info i am finally change cluster to this and running the model and hope it is work

AlexeyPechnikov commented 5 months ago

Usually, we try to have 4GB+ RAM for every worker, and my example configuration uses 3 workers on 16GB RAM providing about 5.5GB per worker. You can allocate less RAM per worker but sometimes it can be not sufficient.

AlexeyPechnikov / pygmtsar

Error with `s1.download_orbit`: Connection Aborted When Downloading More Than 30 Scenes #138

scan the data directory for SLC scenes and download missed orbits

Apple Silicon Air Dask initialization for big data

Increase timeouts to work on large datasets in Jupyter notebooks

Set aggressive memory utilization

Load the most powerful Dask Distributed scheduler

Cleanup to restart the client and cluster for repeatable runs

Apple Silicon big.LITTLE 4+4 cores configuration