aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.34k stars 4.08k forks source link

Add ability for S3 commands to increase retry count #1092

Open jamesls opened 9 years ago

jamesls commented 9 years ago

We've seen several issues opened now where, due to a number of variables, the max number of attempts, which is currently 5, is too low. This can be due to a less reliable WAN link, the available resources on the machine running the commands not being sufficient, the parallelism for S3 transfers being too high, etc.

To help with this issue, we should provide some sort of mechanism that allows a user to bump up the retry count. The main use case would be when transferring either a large amount of files or large files. In these scenarios you're more willing to retry as many times as needed to get the request to succeed.

See:

https://github.com/aws/aws-cli/issues/1065

ShyneColdchain commented 9 years ago

Correct me if I am wrong - would this potentially help issue of ~190GB upload to s3 bucket in region: us standard via

aws s3 cp DATA.csv s3://BUCKET_NAME/data.csv ?

Creates about 900 parts. It gets to about 15 of 900 parts before failing with:

upload failed: ./DATA.csv to s3://BUCKET_NAME/data.csv HTTPSConnectionPool(host='BUCKET_NAME.s3.amazonaws.com', port=443): Max retries exceeded with url: /data.csv?partNumber=9&uploadId=CgfYBQnTUBVMCmrdy_uvMXOk0vqQcsBl570rE6LCC7aNzHO8wBtn_Y1A.gkP9A35VLpOruZXD6k9pPBIUNmXsQ-- (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer)

Thank you.

ShyneColdchain commented 9 years ago

^ And includes well over 5 retries in --debug (I meant to include that).

pingaws commented 9 years ago

Is there any progress or plan about the feature release?

spookylukey commented 8 years ago

For the case of large files, it seems from this line that if any part of an upload fails, the whole thing is cancelled:

https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py#L259

The problem here is that for an unreliable internet connection (e.g. fails every 10 minutes) and a large file, there is a very high chance that at least one part of a multipart upload is going to fail. This means that the whole upload gets cancelled, i.e. a very low chance of success.

Could these failed parts be re-queued instead of causing cancellation?

spookylukey commented 8 years ago

Also looking at the code, it seems there are only retries for downloads, not uploads - https://github.com/aws/aws-cli/blob/develop/awscli/customizations/s3/tasks.py. This means that despite the mulitpart upload feature, large files are very unlikely to succeed if there are issues with the network connection - if any part fails then the whole is cancelled.

spookylukey commented 8 years ago

I'd be willing to work on this. However, I'd need some guidance:

1) Uploading needs retry logic adding, as it currently has none. Should we just do what DownloadPartTask does (repeat in a loop), or something else? Should it default to the same number of attempts as DownloadPartTask?

2) Should there be separate configuration parameters for download retries/upload retries?

3) Should it be possible to configure infinite retries, and what value should be used for that?

spookylukey commented 8 years ago

@kyleknap I'm offering to work on this - if someone can answer my questions above, I can get going. There are two separate features I guess:

1) retries for uploading 2) configuration for number of uploads.

Do you want me to create a new issue for part 1) ?

kyleknap commented 8 years ago

Here are some responses to your previous question:

1) So for upload parts we actually do have retry logic, that lives in botocore: https://github.com/boto/botocore/blob/develop/botocore/retryhandler.py. This defaults to 5: https://github.com/boto/botocore/blob/develop/botocore/data/_retry.json#L48 For the download parts though, we have some more retry logic on top of botocore's retry logic, causing the retries to be potentially more than 5.

2) I think one configuration option would be best here. We see retries happen a lot for multipart copies.

3) No I do not think that infinite retries should be allowable. For uploads we already do exponential backoff, so the time waiting between retries will get unreasonably long and it should error out.

I think it should be fine to keep tracking this on this issue. No need for a new issue to be opened.

I think being able to hook into the botocore logic that I linked with a value that you can provide for max retries would be the best approach, and I believe that was what James was referring to when he first opened the issue.

spookylukey commented 8 years ago

Great, thanks so much for the response, hopefully I'll get time to look at this over the Christmas period.

spookylukey commented 8 years ago

Working out how to configure the max_attempts value is proving quite difficult...

There is no documentation for how to do this kind of thing - https://botocore.readthedocs.org/en/latest/index.html - and I generally have the principle of "docs or it doesn't exist".

But digging deeper, here is the chain I followed:

The config for the retries is loaded from _retry.json, via self._loader.load_data('_retry'). This doesn't seem to give any opportunity for passing in other config, except by additional configuration files (via botocore.loaders.Loader.search_paths

So I can't see any way to configure this programmatically, without changes to botocore.

spookylukey commented 8 years ago

In case anyone else is looking for a workaround, I've found that the sync command for s3cmd works well.

thehesiod commented 8 years ago

@spookylukey Mind entering a botocore issue for us? Sounds like this would fit perfectly in the Config class: http://botocore.readthedocs.org/en/latest/reference/config.html

ASayre commented 6 years ago

Good Morning!

We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI.

This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports.

As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions.

We’ve imported existing feature requests from GitHub - Search for this issue there!

And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue.

GitHub will remain the channel for reporting bugs.

Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface

-The AWS SDKs & Tools Team

This entry can specifically be found on UserVoice at: https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168364-add-ability-for-s3-commands-to-increase-retry-coun

jamesls commented 6 years ago

Based on community feedback, we have decided to return feature requests to GitHub issues.

madrobby commented 4 years ago

Hi, is there any movement on this? I have a spotty connection and literally are unable to download any file that’s larger than a few hundred MiB from S3.