Open hongbo-miao opened 11 months ago
Hi @hongbo-miao, how many files are in the bucket? I'm trying to understand if this is a case of "download many small files" or "download a few huge files".
If there are many small files we have a couple of options - one is using AWS's built in retry mechanism to retry failures such as the one above. A minimal example of using that would look like:
from prefect_aws import AwsClientParameters, AwsCredentials, S3Bucket
client_params = AwsClientParameters(config={"retries": {"max_attempts": 10}})
creds = AwsCredentials(aws_client_parameters=client_params, ...)
bucket = S3Bucket(credentials=creds, bucket_name="...", ...)
...
bucket.download_folder_to_path()
The other option is to try using the bucket's S3Bucket.get_directory method -- internally this method will download one file at a time (as opposed to S3Bucket.download_folder_to_path
which will concurrently download all files at once). This option will be a bit slower but won't overwhelm the bucket.
If the bucket has a couple of big files we'll need to try something else. Let me know if either of those help.
Issue
I have a ~150 GB folder that I am trying to download from S3 to local. Here is my code:
After downloading the folder for about 1 min 500MB/s, the task will failed with error
I feel it is related with https://stackoverflow.com/a/46387660/2000548
Basically, if has a lot of requests to a S3 in short time, it will fail. Any way to download big folder at once? Thanks! ☺️