Open gstaubli opened 9 years ago
@chouhanyang - ping
Sorry to hear that. This seems to be a weird race condition or maybe a conflict in temp file name space. It would be helpful if you can run with --debug option to get more detailed log. On the other hand, I can try to add more instrumentation to figure out what's going on here.
I ran a test by downloading 483 LZO files in an S3 "directory" in debug mode and here are the last several lines including the error:
(D)s4cmd.py:161 << wrapper(<ThreadUtil(Thread-27, started daemon 47975866906944)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00063.lzo', u'./some_directory/part-m-00063.lzo'): None
(D)s4cmd.py:161 >> wrapper(<ThreadUtil(Thread-27, started daemon 47975866906944)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00129.lzo', u'./some_directory/part-m-00129.lzo')
(D)s4cmd.py:161 >> _kick_off_downloads(<ThreadUtil(Thread-27, started daemon 47975866906944)>, <__main__.S3URL instance at 0x2ba26b0e9320>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00129.lzo', u'./some_directory/part-m-00129.lzo')
(D)s4cmd.py:161 >> mkdirs(<ThreadUtil(Thread-27, started daemon 47975866906944)>, u'./some_directory/part-m-00129.lzo-cx481ufrsz3pa0o.tmp')
(D)s4cmd.py:161 << mkdirs(<ThreadUtil(Thread-27, started daemon 47975866906944)>, u'./some_directory/part-m-00129.lzo-cx481ufrsz3pa0o.tmp'): None
(D)s4cmd.py:161 >> update_privilege(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp')
(D)s4cmd.py:161 << update_privilege(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp'): None
(D)s4cmd.py:161 >> _verify_file_size(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00027.lzo>, u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp')
(D)s4cmd.py:161 << _verify_file_size(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00027.lzo>, u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp'): None
(D)s4cmd.py:161 << _kick_off_downloads(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <__main__.S3URL instance at 0x2ba22f90f5f0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo'): None
(D)s4cmd.py:161 << wrapper(<ThreadUtil(Thread-30, started daemon 47975898376512)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo'): None
(D)s4cmd.py:161 >> wrapper(<ThreadUtil(Thread-30, started daemon 47975898376512)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00130.lzo', u'./some_directory/part-m-00130.lzo')
(D)s4cmd.py:161 >> _kick_off_downloads(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <__main__.S3URL instance at 0x2ba280ea5518>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00130.lzo', u'./some_directory/part-m-00130.lzo')
(D)s4cmd.py:161 >> mkdirs(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u'./some_directory/part-m-00130.lzo-8wg4jn523kzmsub.tmp')
(D)s4cmd.py:161 << mkdirs(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u'./some_directory/part-m-00130.lzo-8wg4jn523kzmsub.tmp'): None
(D)s4cmd.py:161 >> update_privilege(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp')
(D)s4cmd.py:161 << update_privilege(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp'): None
(D)s4cmd.py:161 >> _verify_file_size(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00005.lzo>, u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp')
(D)s4cmd.py:161 << _verify_file_size(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00005.lzo>, u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp'): None
(D)s4cmd.py:161 << _kick_off_downloads(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <__main__.S3URL instance at 0x2ba22b2d2998>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo'): None
(D)s4cmd.py:161 << wrapper(<ThreadUtil(Thread-7, started daemon 47975582550336)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo'): None
(D)s4cmd.py:161 >> wrapper(<ThreadUtil(Thread-7, started daemon 47975582550336)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00131.lzo', u'./some_directory/part-m-00131.lzo')
(D)s4cmd.py:161 >> _kick_off_downloads(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <__main__.S3URL instance at 0x2ba280f5a5a8>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00131.lzo', u'./some_directory/part-m-00131.lzo')
(D)s4cmd.py:161 >> update_privilege(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp')
(D)s4cmd.py:161 >> mkdirs(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u'./some_directory/part-m-00131.lzo-vkx45jmoy6f789p.tmp')
(D)s4cmd.py:161 << mkdirs(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u'./some_directory/part-m-00131.lzo-vkx45jmoy6f789p.tmp'): None
(D)s4cmd.py:161 << update_privilege(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp'): None
(D)s4cmd.py:161 >> _verify_file_size(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00051.lzo>, u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp')
(D)s4cmd.py:161 << _verify_file_size(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00051.lzo>, u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp'): None
(D)s4cmd.py:161 << _kick_off_downloads(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <__main__.S3URL instance at 0x2ba246233dd0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo'): None
(D)s4cmd.py:161 << wrapper(<ThreadUtil(Thread-52, started daemon 47976129415488)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo'): None
(D)s4cmd.py:161 >> wrapper(<ThreadUtil(Thread-52, started daemon 47976129415488)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00132.lzo', u'./some_directory/part-m-00132.lzo')
(D)s4cmd.py:161 >> _kick_off_downloads(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <__main__.S3URL instance at 0x2ba25847bbd8>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00132.lzo', u'./some_directory/part-m-00132.lzo')
(D)s4cmd.py:161 >> mkdirs(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u'./some_directory/part-m-00132.lzo-jfordlqmhey6347.tmp')
(D)s4cmd.py:161 << mkdirs(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u'./some_directory/part-m-00132.lzo-jfordlqmhey6347.tmp'): None
(D)s4cmd.py:161 >> update_privilege(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp')
(D)s4cmd.py:161 << update_privilege(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp'): None
(D)s4cmd.py:161 >> _verify_file_size(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00049.lzo>, u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp')
(D)s4cmd.py:161 << _verify_file_size(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00049.lzo>, u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp'): None
(D)s4cmd.py:161 << _kick_off_downloads(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <__main__.S3URL instance at 0x2ba2462335f0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo'): None
(D)s4cmd.py:161 << wrapper(<ThreadUtil(Thread-50, started daemon 47976108435776)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo'): None
(D)s4cmd.py:161 >> wrapper(<ThreadUtil(Thread-50, started daemon 47976108435776)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00133.lzo', u'./some_directory/part-m-00133.lzo')
(D)s4cmd.py:161 >> _kick_off_downloads(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <__main__.S3URL instance at 0x2ba280eb4560>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00133.lzo', u'./some_directory/part-m-00133.lzo')
(D)s4cmd.py:161 >> mkdirs(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u'./some_directory/part-m-00133.lzo-xjg163lkdib7aty.tmp')
(D)s4cmd.py:161 << mkdirs(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u'./some_directory/part-m-00133.lzo-xjg163lkdib7aty.tmp'): None
(E)s4cmd.py:173 [OSError] 2: No such file or directory
(D)s4cmd.py:161 >> update_privilege(<ThreadUtil(Thread-16, started daemon 47975751518528)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00067.lzo', u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp')
(D)s4cmd.py:161 << update_privilege(<ThreadUtil(Thread-16, started daemon 47975751518528)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00067.lzo', u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp'): None
(D)s4cmd.py:161 >> _verify_file_size(<ThreadUtil(Thread-16, started daemon 47975751518528)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00067.lzo>, u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp')
(E)s4cmd.py:173 [OSError] 2: No such file or directory
(E)s4cmd.py:173 [Thread Failure] [Errno 2] No such file or directory: './some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp'
OK, that's helpful, thanks! My guess is that S3 throttling starts earlier, so some files are unable to be downloaded and so some files doesn't even existed. My suggestion would be:
We can definitely provide more useful error message BTW.
Unfortunately using wildcards didn't solve the problem, but it did seem to download for much longer than usual (66min), so maybe it helped to some degree. Trying now with fewer threads.
Got another failure even when using --numthreads 16 after downloading 15GB (303 files) for about 80min.
Are you running s4cmd on EC2 or outside AWS?
In this case, the only thing I can suggest is to modify the RETRY_DELAY in s4cmd.py. You can set it longer, like 1~5 min just in case the network does have some issues from time to time. Retry after 10 seconds may not be long enough.
We plan to implement exponential delay. But didn't get to that yet. Hope this helps.
Running on a non-AWS server located in San Francisco. I'll try working with the RETRY_DELAY.
I did another test out of curiosity. I used s4cmd to get a listing of all the files in a "directory," then extracted the file name from the "ls" and ran using s4cmd via "xargs -P12 -n1 s4cmd get" as such:
time s4cmd.py ls --recursive s3://some-s3-bucket-of-reports/some_directory | cut -d" " -f4 | xargs -P12 -n1 s4cmd.py --num-threads 1 get
Unfortunately even this failed on at least 1 file (succeeded on many others)
Regarding exponential delay, s3cmd has a simple delay process that may be worth replicating. Would you be interested in a pull request with an exponential delay?
Not sure what's the process in s3cmd, but I think it 's a one line change in line 378:
time.sleep(RETRY_DELAY)
to
time.sleep(RETRY_DELAY * (2 ** retry))
that's it. Feel free to try it and see if it works for you. Thanks!
While I did only one test with the exponential delay setting you recommended, it worked successfully in downloading 23GB, so that's a great first sample run. I'm running the same test again and will report back with results.
I'm also encountering this bug on v1.5.20. Changing the time.sleep line to use an exponentially increasing delay didn't seem to help.
This has been open for more than a year now, any progress on this?
+1 @dcelasun I just faced this 2 days back. Had around 4k objects each around 250MBs large. The s4cmd sync failed 5-7 mins during the run.
Failing for me as well using sync from non AWS machine. Reducing number of threads seemed to help.
The new s4cmd 2.0 switched to boto3 library. Can folks give it a try and see if it is more stable against this issue? Thanks.
It's still failing pretty consistently.
I find when I try to download multiple Gigabytes from S3 to my local filesystem, this error will stop the process and therefore not download the rest of the files.
[OSError] 2: No such file or directory [OSError] 2: No such file or directory [OSError] 2: No such file or directory [Thread Failure] [Errno 2] No such file or directory: './my_directory/my_file-thl8fgmwbcnxy36.tmp'
It usually succeeds in downloading a few hundred of ~1,500 files before it errors out, so I am left with let's say about half the directory's actual contents and lacking the rest of the files. I have found I can re-run the command multiple times and eventually it will finish. So it may be an edge case that crops up every few hundred or so transfers.
Here's the command:
s4cmd.py get -r -s s3://my-s3-bucket/my_directory
Any idea what's going on? I use s4cmd just about every day and absolutely love it; with this issue resolved I have nothing but praise for it!