cp to s3 - Multipart upload fails intermittently (without apparent reason)

TuomasHeikkila commented 10 years ago

I am using AWS CLI through a scheduled .bat, and for no apparent reason, almost every night one of the three uploads fail with [Errno 10053] An established connection was aborted by the software in your host machine.

After quickly browsing through the debug log, the upload completed all 152 of 152 parts with -1 file remaining but the upload still failed. Many of the upload parts were being retried though, which lead me to suspect that maybe the program shuts down too early (before all retries are completed?)

This would require some looking into, in order to form a clear bug report.

danielgtaylor commented 10 years ago

@TuomasHeikkila can you give us any more information to try and reproduce the issue? What version of Python are you using? What version of the AWS CLI? About how many files are being transferred and roughly what sizes are the files? Do you have any special CLI configuration? What command are you running?

If possible, running with --debug and pasting the sanitized (obfuscate or remove any personal or private info) output would be helpful to us.

TuomasHeikkila commented 10 years ago

https://anonfiles.com/file/b766a1f9ddcc36a15ad46da8ec8b84b0

The complete 1,5mb log file from the failed file upload should be available in the link above. As I mentioned in the title, this bug appears extremely intermittenly; not once has the same part of any file been a re-offender.

the cliffnotes for our setup is as follows: CLI version: aws-cli/1.3.1 Python/2.7.5 Windows/2008Server botocore version: 0.35.0

We upload nightly three backup files, ranging from 1GB to 300mb. The AWSCLI configuration has not been fiddled with, apart from adding our credentials and the preferred location.

The scheduled .bat is just forming a .zip, encrypting the content, uploading it, making a copy of the upload inside the bucket using s3api copy-object, comparing the checksums and then removing the copied object used for checksumming.

aws cp s3:// --debug aws s3api copy-object --bucket --copy-source filename --key aws s3 rm

Last night everything went just swell, no errors and no problems. has this kind of behavior been linked to any existing or fixed bug?

danielgtaylor commented 10 years ago

@jamesls any insight into the above? Could this be related to other s3 issues you saw with either large or many files?

jamesls commented 10 years ago

@TuomasHeikkila can you tell me about the server? How many cores, is it typically under heavy load, if it's an EC2 instance, is it in the same region as S3?

One possibility is that for some machines we may be making requests too aggressively which may cause connection resets. Even though we retry those requests, if we are still too aggressive, we may be exhausting all our retry attempts and causing the upload to fail.

In that case we either need to put better retry logic in the CLI (if we start to see connection resets, slow down the request rates), or just expose these options to the user (num_threads/retry count/ etc.).

tuoheikk commented 10 years ago

It's a dedicated server located in Finland (the targeted S3 region is Ireland) Specs:

Quad-Core AMD opteron processor 1354 2.20ghz
2gb ram
32-bit Windows 2008 Server Standard Service Pack 2 We upload our backups starting at 01:00 every night, where there should be no other traffic to the server.

The problem could be regional, if this you haven't seen this kind of activity before. Finnish internet traffic is quite poorly routed, it could also be that we are bottlenecking somewhere in Sweden before being routed to the "backbone".

Gaining access to the options would be extremely helpful.

As an update to the situation, It's now been a week without any failures. Extremely intermittently is the key word.

jamesls commented 10 years ago

I believe this is fixed in the latest version of the CLI. Please let us know if you're still seeing this issue.

aws / aws-cli

cp to s3 - Multipart upload fails intermittently (without apparent reason) #707