bloomreach / s4cmd

Super S3 command line tool
Apache License 2.0
1.37k stars 211 forks source link

[OSError] 2: No such file or directory #18

Open gstaubli opened 9 years ago

gstaubli commented 9 years ago

I find when I try to download multiple Gigabytes from S3 to my local filesystem, this error will stop the process and therefore not download the rest of the files.

[OSError] 2: No such file or directory [OSError] 2: No such file or directory [OSError] 2: No such file or directory [Thread Failure] [Errno 2] No such file or directory: './my_directory/my_file-thl8fgmwbcnxy36.tmp'

It usually succeeds in downloading a few hundred of ~1,500 files before it errors out, so I am left with let's say about half the directory's actual contents and lacking the rest of the files. I have found I can re-run the command multiple times and eventually it will finish. So it may be an edge case that crops up every few hundred or so transfers.

Here's the command: s4cmd.py get -r -s s3://my-s3-bucket/my_directory

Any idea what's going on? I use s4cmd just about every day and absolutely love it; with this issue resolved I have nothing but praise for it!

viksit commented 9 years ago

@chouhanyang - ping

chouhanyang commented 9 years ago

Sorry to hear that. This seems to be a weird race condition or maybe a conflict in temp file name space. It would be helpful if you can run with --debug option to get more detailed log. On the other hand, I can try to add more instrumentation to figure out what's going on here.

gstaubli commented 9 years ago

I ran a test by downloading 483 LZO files in an S3 "directory" in debug mode and here are the last several lines including the error:

  (D)s4cmd.py:161  << wrapper(<ThreadUtil(Thread-27, started daemon 47975866906944)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00063.lzo', u'./some_directory/part-m-00063.lzo'): None
  (D)s4cmd.py:161  >> wrapper(<ThreadUtil(Thread-27, started daemon 47975866906944)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00129.lzo', u'./some_directory/part-m-00129.lzo')
  (D)s4cmd.py:161  >> _kick_off_downloads(<ThreadUtil(Thread-27, started daemon 47975866906944)>, <__main__.S3URL instance at 0x2ba26b0e9320>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00129.lzo', u'./some_directory/part-m-00129.lzo')
  (D)s4cmd.py:161  >> mkdirs(<ThreadUtil(Thread-27, started daemon 47975866906944)>, u'./some_directory/part-m-00129.lzo-cx481ufrsz3pa0o.tmp')
  (D)s4cmd.py:161  << mkdirs(<ThreadUtil(Thread-27, started daemon 47975866906944)>, u'./some_directory/part-m-00129.lzo-cx481ufrsz3pa0o.tmp'): None
  (D)s4cmd.py:161  >> update_privilege(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp')
  (D)s4cmd.py:161  << update_privilege(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp'): None
  (D)s4cmd.py:161  >> _verify_file_size(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00027.lzo>, u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp')
  (D)s4cmd.py:161  << _verify_file_size(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00027.lzo>, u'./some_directory/part-m-00027.lzo-gf9a06jpnztd18v.tmp'): None
  (D)s4cmd.py:161  << _kick_off_downloads(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <__main__.S3URL instance at 0x2ba22f90f5f0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo'): None
  (D)s4cmd.py:161  << wrapper(<ThreadUtil(Thread-30, started daemon 47975898376512)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00027.lzo', u'./some_directory/part-m-00027.lzo'): None
  (D)s4cmd.py:161  >> wrapper(<ThreadUtil(Thread-30, started daemon 47975898376512)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00130.lzo', u'./some_directory/part-m-00130.lzo')
  (D)s4cmd.py:161  >> _kick_off_downloads(<ThreadUtil(Thread-30, started daemon 47975898376512)>, <__main__.S3URL instance at 0x2ba280ea5518>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00130.lzo', u'./some_directory/part-m-00130.lzo')
  (D)s4cmd.py:161  >> mkdirs(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u'./some_directory/part-m-00130.lzo-8wg4jn523kzmsub.tmp')
  (D)s4cmd.py:161  << mkdirs(<ThreadUtil(Thread-30, started daemon 47975898376512)>, u'./some_directory/part-m-00130.lzo-8wg4jn523kzmsub.tmp'): None
  (D)s4cmd.py:161  >> update_privilege(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp')
  (D)s4cmd.py:161  << update_privilege(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp'): None
  (D)s4cmd.py:161  >> _verify_file_size(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00005.lzo>, u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp')
  (D)s4cmd.py:161  << _verify_file_size(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00005.lzo>, u'./some_directory/part-m-00005.lzo-nw9dsi06zyq1ta2.tmp'): None
  (D)s4cmd.py:161  << _kick_off_downloads(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <__main__.S3URL instance at 0x2ba22b2d2998>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo'): None
  (D)s4cmd.py:161  << wrapper(<ThreadUtil(Thread-7, started daemon 47975582550336)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00005.lzo', u'./some_directory/part-m-00005.lzo'): None
  (D)s4cmd.py:161  >> wrapper(<ThreadUtil(Thread-7, started daemon 47975582550336)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00131.lzo', u'./some_directory/part-m-00131.lzo')
  (D)s4cmd.py:161  >> _kick_off_downloads(<ThreadUtil(Thread-7, started daemon 47975582550336)>, <__main__.S3URL instance at 0x2ba280f5a5a8>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00131.lzo', u'./some_directory/part-m-00131.lzo')
  (D)s4cmd.py:161  >> update_privilege(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp')
  (D)s4cmd.py:161  >> mkdirs(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u'./some_directory/part-m-00131.lzo-vkx45jmoy6f789p.tmp')
  (D)s4cmd.py:161  << mkdirs(<ThreadUtil(Thread-7, started daemon 47975582550336)>, u'./some_directory/part-m-00131.lzo-vkx45jmoy6f789p.tmp'): None
  (D)s4cmd.py:161  << update_privilege(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp'): None
  (D)s4cmd.py:161  >> _verify_file_size(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00051.lzo>, u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp')
  (D)s4cmd.py:161  << _verify_file_size(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00051.lzo>, u'./some_directory/part-m-00051.lzo-6u9r43nvsxi0gcz.tmp'): None
  (D)s4cmd.py:161  << _kick_off_downloads(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <__main__.S3URL instance at 0x2ba246233dd0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo'): None
  (D)s4cmd.py:161  << wrapper(<ThreadUtil(Thread-52, started daemon 47976129415488)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00051.lzo', u'./some_directory/part-m-00051.lzo'): None
  (D)s4cmd.py:161  >> wrapper(<ThreadUtil(Thread-52, started daemon 47976129415488)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00132.lzo', u'./some_directory/part-m-00132.lzo')
  (D)s4cmd.py:161  >> _kick_off_downloads(<ThreadUtil(Thread-52, started daemon 47976129415488)>, <__main__.S3URL instance at 0x2ba25847bbd8>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00132.lzo', u'./some_directory/part-m-00132.lzo')
  (D)s4cmd.py:161  >> mkdirs(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u'./some_directory/part-m-00132.lzo-jfordlqmhey6347.tmp')
  (D)s4cmd.py:161  << mkdirs(<ThreadUtil(Thread-52, started daemon 47976129415488)>, u'./some_directory/part-m-00132.lzo-jfordlqmhey6347.tmp'): None
  (D)s4cmd.py:161  >> update_privilege(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp')
  (D)s4cmd.py:161  << update_privilege(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp'): None
  (D)s4cmd.py:161  >> _verify_file_size(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00049.lzo>, u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp')
  (D)s4cmd.py:161  << _verify_file_size(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00049.lzo>, u'./some_directory/part-m-00049.lzo-ipnzl24k3yeshcq.tmp'): None
  (D)s4cmd.py:161  << _kick_off_downloads(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <__main__.S3URL instance at 0x2ba2462335f0>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo'): None
  (D)s4cmd.py:161  << wrapper(<ThreadUtil(Thread-50, started daemon 47976108435776)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00049.lzo', u'./some_directory/part-m-00049.lzo'): None
  (D)s4cmd.py:161  >> wrapper(<ThreadUtil(Thread-50, started daemon 47976108435776)>, None, u's3://some-s3-bucket-of-reports/some_directory/part-m-00133.lzo', u'./some_directory/part-m-00133.lzo')
  (D)s4cmd.py:161  >> _kick_off_downloads(<ThreadUtil(Thread-50, started daemon 47976108435776)>, <__main__.S3URL instance at 0x2ba280eb4560>, <Bucket: some-s3-bucket-of-reports>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00133.lzo', u'./some_directory/part-m-00133.lzo')
  (D)s4cmd.py:161  >> mkdirs(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u'./some_directory/part-m-00133.lzo-xjg163lkdib7aty.tmp')
  (D)s4cmd.py:161  << mkdirs(<ThreadUtil(Thread-50, started daemon 47976108435776)>, u'./some_directory/part-m-00133.lzo-xjg163lkdib7aty.tmp'): None
  (E)s4cmd.py:173  [OSError] 2: No such file or directory
  (D)s4cmd.py:161  >> update_privilege(<ThreadUtil(Thread-16, started daemon 47975751518528)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00067.lzo', u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp')
  (D)s4cmd.py:161  << update_privilege(<ThreadUtil(Thread-16, started daemon 47975751518528)>, u's3://some-s3-bucket-of-reports/some_directory/part-m-00067.lzo', u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp'): None
  (D)s4cmd.py:161  >> _verify_file_size(<ThreadUtil(Thread-16, started daemon 47975751518528)>, <Key: some-s3-bucket-of-reports,some_directory/part-m-00067.lzo>, u'./some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp')
  (E)s4cmd.py:173  [OSError] 2: No such file or directory
  (E)s4cmd.py:173  [Thread Failure] [Errno 2] No such file or directory: './some_directory/part-m-00067.lzo-jsvo3ah1ei64cp9.tmp'
chouhanyang commented 9 years ago

OK, that's helpful, thanks! My guess is that S3 throttling starts earlier, so some files are unable to be downloaded and so some files doesn't even existed. My suggestion would be:

  1. Try to download without recursive (use wildcards) : s4cmd.py get -f s3://my-s3-bucket/my_directory/*
  2. Use less threads: with --num-threads. I generally use 16 threads. See if it works for you or try less threads.

We can definitely provide more useful error message BTW.

gstaubli commented 9 years ago

Unfortunately using wildcards didn't solve the problem, but it did seem to download for much longer than usual (66min), so maybe it helped to some degree. Trying now with fewer threads.

gstaubli commented 9 years ago

Got another failure even when using --numthreads 16 after downloading 15GB (303 files) for about 80min.

chouhanyang commented 9 years ago

Are you running s4cmd on EC2 or outside AWS?

In this case, the only thing I can suggest is to modify the RETRY_DELAY in s4cmd.py. You can set it longer, like 1~5 min just in case the network does have some issues from time to time. Retry after 10 seconds may not be long enough.

We plan to implement exponential delay. But didn't get to that yet. Hope this helps.

gstaubli commented 9 years ago

Running on a non-AWS server located in San Francisco. I'll try working with the RETRY_DELAY.

I did another test out of curiosity. I used s4cmd to get a listing of all the files in a "directory," then extracted the file name from the "ls" and ran using s4cmd via "xargs -P12 -n1 s4cmd get" as such:

time s4cmd.py ls --recursive s3://some-s3-bucket-of-reports/some_directory | cut -d" " -f4 | xargs -P12 -n1 s4cmd.py --num-threads 1 get

Unfortunately even this failed on at least 1 file (succeeded on many others)

gstaubli commented 9 years ago

Regarding exponential delay, s3cmd has a simple delay process that may be worth replicating. Would you be interested in a pull request with an exponential delay?

chouhanyang commented 9 years ago

Not sure what's the process in s3cmd, but I think it 's a one line change in line 378:

time.sleep(RETRY_DELAY)

to

time.sleep(RETRY_DELAY * (2 ** retry))

that's it. Feel free to try it and see if it works for you. Thanks!

gstaubli commented 9 years ago

While I did only one test with the exponential delay setting you recommended, it worked successfully in downloading 23GB, so that's a great first sample run. I'm running the same test again and will report back with results.

JohnHBrock commented 9 years ago

I'm also encountering this bug on v1.5.20. Changing the time.sleep line to use an exponentially increasing delay didn't seem to help.

dcelasun commented 8 years ago

This has been open for more than a year now, any progress on this?

agnibha commented 8 years ago

+1 @dcelasun I just faced this 2 days back. Had around 4k objects each around 250MBs large. The s4cmd sync failed 5-7 mins during the run.

meyersj commented 8 years ago

Failing for me as well using sync from non AWS machine. Reducing number of threads seemed to help.

chouhanyang commented 8 years ago

The new s4cmd 2.0 switched to boto3 library. Can folks give it a try and see if it is more stable against this issue? Thanks.

jimmymoore commented 8 years ago

It's still failing pretty consistently.