S3 - RequestTimeout during large files

lloydcotten commented 10 years ago

I'm trying to upload a large file (9 GB) and getting a RequestTimeout error using aws s3 mv ...

I haven't fully tested it yet, but it seems like if I run the command over and over it will eventually work.

Here's the debug log from a failed attempt: https://s3.amazonaws.com/nimbus-public/s3_backup.log

I'll post back if I determine that retrying the command several times works or not.

aws version: aws-cli/1.1.2 Python/2.7.3 Windows/2008ServerR2

lloydcotten commented 10 years ago

After multiple retries the command does eventually work on these large files (7-11GB). But sometimes takes dozens of retries.

BTW, I'm running the command on an EC2 instance - shouldn't be any latency or network issues.

jamesls commented 10 years ago

Looking into this, I believe I know what's causing the issue.

dcg9381 commented 10 years ago

Note, I'm having similar reliability issues moving even larger files ~175Gb to S3. We've tried mv, sync, and copy with various results. We're running the following: aws s3 --version aws-cli/1.1.2 Python/2.6.6 Linux/2.6.32-358.el6.x86_64

Note this is a single (large) file. It's already compressed.

We often see: A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

Because of the large file size, retrying is extremely expensive. The larger the file size, the less luck we're having.

Is there a better tool that can be used linux command line for moving big data to S3?

dcg9381 commented 10 years ago

Out of 4 attempts yesterday, only one of them was successful. 25% isn't a good enough success rate for us to depend on this for what we're trying to do.

Let me know if you need full logging.

Recent improvements with ver 1.1.2 include giving a valid non-zero return code when we fail, so at least we know that we've failed.

Here's the most recent tail of the output:

2013-10-16 18:05:42,755 - botocore.hooks - DEBUG - Event before-auth.s3: calling handler <function fix_s3_host at 0x2527668> 2013-10-16 18:05:42,755 - botocore.handlers - DEBUG - Checking for DNS compatible bucket for: https://s3.amazonaws.com/AWL-Backup/mysql/innobackup-mysql-2013-10-16.xbstream?uploadId=Y1fx18aYpHc91zTKCYJWrziFTDYGYIWmow80MSK28xrH9RX7ZzxKs61mHKB1opG7YNlIeiSr8fcsSVN5_LSn5j0wBQQfe1GizoFVhC4arAQRDWiGLt_6HNrFu02ej1au 2013-10-16 18:05:42,755 - botocore.handlers - DEBUG - Not changing URI, bucket is not DNS compatible: AWL-Backup 2013-10-16 18:05:42,755 - botocore.auth - DEBUG - Calculating signature using hmacv1 auth. 2013-10-16 18:05:42,755 - botocore.auth - DEBUG - HTTP request method: DELETE 2013-10-16 18:05:42,756 - botocore.auth - DEBUG - StringToSign: DELETE

Thu, 17 Oct 2013 00:05:42 GMT /AWL-Backup/mysql/innobackup-mysql-2013-10-16.xbstream?uploadId=Y1fx18aYpHc91zTKCYJWrziFTDYGYIWmow80MSK28xrH9RX7ZzxKs61mHKB1opG7YNlIeiSr8fcsSVN5_LSn5j0wBQQfe1GizoFVhC4arAQRDWiGLt_6HNrFu02ej1au 2013-10-16 18:05:42,756 - botocore.endpoint - DEBUG - Sending http request: 2013-10-16 18:05:43,448 - botocore.response - DEBUG - Response Body:

2013-10-16 18:05:43,448 - botocore.hooks - DEBUG - Event needs-retry.s3.AbortMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2906b10> 2013-10-16 18:05:43,448 - botocore.retryhandler - DEBUG - No retry needed. 2013-10-16 18:05:43,449 - botocore.hooks - DEBUG - Event after-call.s3.AbortMultipartUpload: calling handler <awscli.errorhandler.ErrorHandler object at 0x2873650> 2013-10-16 18:05:43,449 - awscli.errorhandler - DEBUG - HTTP Response Code: 204

At this point the exit and return $? = 1 (failure).

jamesls commented 10 years ago

This should be fixed now. The issue was that the s3 side of the connection was closing the connection while we were uploading data. We detect this and automatically retry the request when this happens. However, we need to ensure that if the body is a file like object (which is the case when cp/mv/sync'ing to s3) that we properly reset the stream back to the beginning to ensure we send the entire body contents again.

dcg9381 commented 10 years ago

James, thanks. I assume the fix is deployed. We've got an overnight cron to retest. If it fails, I'll be back at it next week.

garnaat commented 10 years ago

The fix is in our develop branch now. It will be incorporated into our next release soon.

dcg9381 commented 10 years ago

garnatt, what's the release schedule or where can I watch for it?

jamesls commented 10 years ago

The 1.2.0 release, which contains this bug fix, is now out.

dcg9381 commented 10 years ago

Thank you!

hubertott commented 10 years ago

Just got this error uploading a 6.6Gb file:

"A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed."

[root@digital ~]# aws s3 --version
aws-cli/1.2.0 Python/2.6.6 Linux/2.6.32-358.6.1.el6.x86_64

Debug Ouput from 2nd Attempt:

2013-10-20 20:49:05,319 - awscli.customizations.s3.tasks - DEBUG - Part number 477 completed for filename: FILE.tar
2013-10-20 20:49:05,348 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}
2013-10-20 20:49:06,642 - botocore.response - DEBUG - Response Body:
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>REQUEST_ID</RequestId><HostId>HOST_ID</HostId></Error>
2013-10-20 20:49:06,643 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:06,643 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:06,643 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:06,643 - awscli.errorhandler - DEBUG - HTTP Response Code: 400
2013-10-20 20:49:06,644 - awscli.customizations.s3.tasks - DEBUG - Error during part upload: A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 148, in __call__
    self._filename.service, 'UploadPart', params)
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/utils.py", line 115, in operate
    http_response, response_data = operation.call(**kwargs)
  File "/usr/lib/python2.6/site-packages/botocore/operation.py", line 82, in call
    parsed=response[1])
  File "/usr/lib/python2.6/site-packages/botocore/session.py", line 550, in emit
    return self._events.emit(event_name, **kwargs)
  File "/usr/lib/python2.6/site-packages/botocore/hooks.py", line 158, in emit
    response = handler(**kwargs)
  File "/usr/lib/python2.6/site-packages/awscli/errorhandler.py", line 50, in __call__
    raise ClientError(msg)
ClientError: A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
2013-10-20 20:49:06,712 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload failed: ..FILE.tar to s3://FILE.tar\nA client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.', 'error': True}
upload failed: ..FILE.tar to s3://FILE.tar
A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
2013-10-20 20:49:07,128 - botocore.response - DEBUG - Response Body:

2013-10-20 20:49:07,128 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:07,128 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:07,128 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:07,128 - awscli.errorhandler - DEBUG - HTTP Response Code: 200
2013-10-20 20:49:07,129 - awscli.customizations.s3.tasks - DEBUG - Part number 478 completed for filename: FILE.tar
2013-10-20 20:49:07,149 - awscli.customizations.s3.executer - DEBUG - Error calling task: Upload has been cancelled.
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/executer.py", line 104, in run
    function()
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 339, in __call__
    parts = self._upload_context.wait_for_parts_to_finish()
  File "/usr/lib/python2.6/site-packages/awscli/customizations/s3/tasks.py", line 437, in wait_for_parts_to_finish
    raise UploadCancelledError("Upload has been cancelled.")
UploadCancelledError: Upload has been cancelled.
2013-10-20 20:49:07,177 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}
2013-10-20 20:49:08,119 - botocore.response - DEBUG - Response Body:

2013-10-20 20:49:08,119 - botocore.hooks - DEBUG - Event needs-retry.s3.UploadPart: calling handler <botocore.retryhandler.RetryHandler object at 0x31844d0>
2013-10-20 20:49:08,119 - botocore.retryhandler - DEBUG - No retry needed.
2013-10-20 20:49:08,120 - botocore.hooks - DEBUG - Event after-call.s3.UploadPart: calling handler <awscli.errorhandler.ErrorHandler object at 0x2f4bdd0>
2013-10-20 20:49:08,120 - awscli.errorhandler - DEBUG - HTTP Response Code: 200
2013-10-20 20:49:08,120 - awscli.customizations.s3.tasks - DEBUG - Part number 480 completed for filename: FILE.tar
2013-10-20 20:49:08,142 - awscli.customizations.s3.executer - DEBUG - Received print task: {'message': u'upload: ..FILE.tar to s3://FILE.tar', 'total_parts': 483, 'error': False}

jamesls commented 10 years ago

Could you check in the debug logs if there's a traceback that occurs earlier in the logs? Generally, we've seen that the RequestTimeoutError occurs because something earlier in the upload triggered a retry and we weren't properly resetting the IO streams on a retry attempt, but this should be fixed in 1.2.0. I'd like to see what caused the initial retry.

I'm also trying to reproduce this issue on v1.2.0. I'll update with what I find.

hubertott commented 10 years ago

For clarification I was using the --recursive option on a folder which contained two sub-folders with a file in each. I have just tried the upload again explicitly specifying the .tar file only and it went through fine.

I will retry the folder with the --recursive option to get the debug logs you have requested.

bravilli commented 10 years ago

so request time-out issue is not resolved yet? I am trying to upload big file by Rails carrierwave, and it also shoots Request-Timeout issue. And it is really critical issue.

Is there anyone who resolved this issue?

dcg9381 commented 10 years ago

I've largely given up on the AWSCLI over a home connection. The connectivity and retry logic isn't robust enough to make this a viable solution. We timeout, retry, and eventually fail on large upload requests.

jamesonjlee commented 10 years ago

seems to work flawlessly on 1.3.12 :+1:

on 1.2.1 got:

upload failed: ./bigzip.zip to s3://mybucket-test-s3/bigzip.zip
A client error (RequestTimeout) occurred: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
``
I see a

Completed 170 of 185 part(s) with -169 file(s) remaining

that decreases with each error message.
This is a aws s3 cp command from a EC2 Instance in the same region the bigzip.zip is about 1.3GB in size.

---

samlambert commented 10 years ago

I am seeing the same issue with a large file. Getting Max retries exceeded with url (Caused by <class 'socket.error'>: [Errno 104] Connection reset by peer). There doesn't seem to be a way to

philippgerard commented 9 years ago

Same problem here with

aws-cli/1.2.9 Python/3.4.0 Linux/3.13.0-29-generic

naren3k commented 7 years ago

Hi, I m getting this even for 2KB files through data power. However 1KB files works fine.

techdragon commented 7 years ago

Still appears to be a problem. Brand new Ubuntu VM, installed AWS CLI tools, aws-cli/1.10.1 Python/3.5.2 Linux/4.4.0-38-generic botocore/1.3.23

Once more I face the dreaded ConnectionResetError 104!

upload failed: 1.pdf to s3://mybucket/1.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 1.pdf to s3://mybucket/1.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 2.pdf to s3://mybucket/2.pdf ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
upload failed: 3.jpg to s3://mybucket/3.jpg ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

teamkde commented 7 years ago

This happens if using the AWS UI to upload large files too.

ghost commented 7 years ago

Hey guys, I am seeing this issue too. I get it while trying to do a multipart upload. I thought my chunk size was too large, but alas I changed it to 5MB and it still gets a timeout error.

This needs to be reopened.

sshaw commented 7 years ago

Same problem. I ended up using good ol' s3cmd:

s3cmd --continue-put put 5GB_file s3://some-bucket

Worked like a charm. Who says s3cmd is dead? :neckbeard: 📁 ➡️ 🌎 ➡️ 📬

tobias-khs commented 7 years ago

Seeing this issue with:

> aws s3 --version
> aws-cli/1.11.44 Python/2.7.3 Linux/3.7.10-1.40-desktop botocore/1.5.7

20 GB file. Retrying usually works, but still

b-xk commented 7 years ago

Also seeing this with

aws s3 --version aws-cli/1.11.56 Python/2.7.12+ Linux/4.6.0-kali1-amd64 botocore/1.5.9

using both the S3 Accelerate endpoint, and the non-accelerated endpoint gave the same 'connection aborted' error as described above, even after trying 10-12 times, with a 3Gb file. (us-west-2)

Interestingly/anecdotally: Removing the following option from the 's3 cp' command allowed the copy to complete on the very next attempt.

Removed the following option; --sse AES256

This suggests to me that there is a (regression?) issue with the way that the confirmation of the part-completions from the server-side are being handled, which are likely made worse/slower when SSE is enabled. Looks like the client may be timing out too quickly ?

ossie-git commented 6 years ago

This issue isn't specific to large files. I was just testing something and tried to upload a copy of the WordPress source files. After about 200 files, it starts to give me the ConnectionResetError(104, 'Connection reset by peer')) for files that are 300KB in size and even smaller

aws s3 --version aws-cli/1.11.13 Python/3.5.2 Linux/4.4.0-1031-aws botocore/1.4.70

ossie-git commented 6 years ago

Just a small update. It seems to have been fixed in some later version (and even runs a lot faster). I checked out the latest version using pip install --upgrade awscli so my version was as follows:

aws s3 --version aws-cli/1.11.138 Python/3.5.2 Linux/4.4.0-1031-aws botocore/1.6.5

Re-ran the command it successfully copied all the files (+ was a lot faster than the default version that you install with Ubuntu 16.04 LTS)

dcg9381 commented 6 years ago

Thank you for the update..

tgirgin23 commented 6 years ago

This hasn't been fixed yet (running latest aws-cli). I did get it to work by setting --cli-read-timeout to 0.

aws --cli-read-timeout 0 s3 cp s3://file .

File successfully cp/mv after doing so.

opyh commented 4 years ago

Experiencing the same issue with a 49 GB file - --cli-read-timeout 0 didn't help here.

aws s3 --version
aws-cli/1.14.44 Python/3.6.9 Linux/4.19.0-0.bpo.6-amd64 botocore/1.8.48

I wonder why the protocol cannot fall back to polling instead of relying on a socket connection being open for a long time while the chunks are being assembled on the server?

ham1 commented 3 years ago

I had this issue today as well, --cli-read-timeout 0 did nothing for me.

I "solved" it by using rclone instead - it worked flawlessly.

mdsp0292 commented 3 years ago

I'm facing the same issue as well when trying to upload a zip file of size 800 MiB

An error occurred (RequestTimeout) when calling the UploadPart operation (reached max retries: 2): Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.

aws s3 --version aws-cli/2.1.37 Python/3.8.8 Linux/4.15.0-123-generic exe/x86_64.ubuntu.18 prompt/off

command used aws --cli-read-timeout 0 s3 cp

asdf01 commented 2 years ago

It's coming up to 8 years since this issue was raised. This issue still exists. Reopen this ticket and fix it! Jeff Bezos has gone to space already, and we still can't reliably copy large files to AWS s3. We are sending terabytes of data half way around the world in our repeated attempts to upload a simple 200GB file.

Just because people are silent on this issue doesn't mean this issue doesn't exist. It either means they've found their own workaround through 3rd party tools or moved on to other cloud providers.

AWS purports to be big on security but can't do basic sheet like this. This forces AWS customers into providing their AWS security credentials to questionable 3rd party tools to achieve the most basic of things like uploading a file.

AWS is really becoming a joke of a cloud provider when they leave fundamental issues like this unresolved for 7+ years. I'm becoming a clown of an employee to my employer for continuing to choose AWS amid all these issues.

Reopen this ticket and fix it. You guys could have chosen to address this issue in one of many ways. You could have provided:

a configurable server timeout parameter
a configurable client side retry count
some type of resume flag that takes the part number

Don't choose to do nothing like a number of other AWS issues. We are getting used to celebrating the anniversary of other unresolved AWS issues in the issue comments just to entertain the other issue followers as a joke. We are certainly not getting any attention from the AWS employees.

Customers are not going to continually beg for something if they are not getting any attention from AWS. They will still be talking about AWS, just not on your platforms. Don't let this situation devolve into a surprise Pikachu face moment for AWS when all of your customers move on to a cloud provider that had obviously necessary features and responds to their customers.

In summary: It's ducking 2021, Jeff Bezos is in space already. We are not asking for flying cars. Just fix the ducking aws s3 cp retry issue.

Wambosa commented 2 years ago

@asdf01, I got some much needed laughing relief from your comment. Its true that AWS has some glaring holes in it while they forge ahead with other solutions.

Its been awhile since I was stung by this, my team is dealing with it now, and unfortunately, we might have to homebrew our own wrapper around it.

I can see that this repo has over 400 issues at time of writing. It must be hard for @jamesls to manage alone. Not sure what is going on back there, as I know we'd have trouble supporting a repo with so many issues. It can be so overwhelming; however I must second the request above to not forget about the architects whom are loyal to AWS, driving millions and millions of dollars to the service each time we build a scaling solution.

It is hard to track how much issues like this cause market share loss, but I promise you it does.

biajoeknee commented 2 years ago

Facing this issue now with uploading files greater than 1MB to S3; although in my case it's being done using the HTTP API. However, after researching this issue for hours now, it appears to be that it is not only experienced across different language libraries (Javascript SDK, Java SDK, etc.), but also across different means of interacting with S3 (HTTP API, CLI, SDKs). I think at this point it would just be a huge relief for you to just communicate what the problem is; even if you have absolutely no intention on fixing it. At least then we can have some closure, stop talking about it, and move on.

aws / aws-cli

S3 - RequestTimeout during large files #401