cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
381 stars 147 forks source link

s3sync() download with prefix fails #389

Open leonidliu opened 3 years ago

leonidliu commented 3 years ago

Possible bug. I'm trying to 'download' from s3sync() while supplying a prefix. The download fails on the first 'file' it tries to download, which appears to be an empty string.

Code:

library(aws.s3)
s3sync(bucket = "tmc-research-projects",
       prefix = "p039_fa_meta/",
       path = "~/Downloads/p039_fa_meta",
       direction = "download")

Last few lines of output, including the error:

names after prefix filter:
 [1] ""                                  "10/"                              
 [3] "10/civis/"                         "10/civis/civis_20201011.csv"      
 [5] "10/swayable/"                      "10/swayable/swayable_20201011.csv"
 [7] "civis_addl_subgroups.csv"          "civis_data.csv"                   
 [9] "recoded_tags.csv"                  "swayable_data.csv"                
10 bucket objects not found in local directory
<== Saving object 'p039_fa_meta/' to '~/Downloads/p039_fa_meta/'
Error in curl::curl_fetch_disk(url, x$path, handle = handle) : 
  Failed to open file /Users/leoliu/Downloads/p039_fa_meta.

Traceback:

> traceback()
8: curl::curl_fetch_disk(url, x$path, handle = handle)
7: request_fetch.write_disk(req$output, req$url, handle)
6: request_fetch(req$output, req$url, handle)
5: request_perform(req, hu$handle$handle)
4: httr::GET(url, H, query = query, write_disk, show_progress, ...)
3: s3HTTP(verb = "GET", bucket = bucket, path = paste0("/", object), 
       headers = headers, write_disk = httr::write_disk(path = file, 
           overwrite = overwrite), ...)
2: save_object(object = key, bucket = bucket, file = dst, ...)
1: s3sync(bucket = "tmc-research-projects", prefix = "p039_fa_meta/", 
       path = "~/Downloads/p039_fa_meta", direction = "download")

Session Info:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] aws.s3_0.3.22

loaded via a namespace (and not attached):
[1] httr_1.4.2          compiler_4.0.2      R6_2.4.1            tools_4.0.2        
[5] base64enc_0.1-3     curl_4.3            aws.signature_0.6.0 xml2_1.3.2         
[9] digest_0.6.27      
wdwatkins commented 2 years ago

Did you create the p039_fa_meta/ prefix manually in the console? I have encountered similar errors and I think they are due to the fact that S3 must create that empty file as a placeholder, so the 'directory' can exist in S3 without any files. Deleting that empty file fixed this error for me.

wdwatkins commented 2 years ago

Since this is avoidable by just deleting the empty file or just uploading your files via an S3 client rather than creating directories manually, you could argue this isn't a high priority to fix in the package. It would be nice in some cases however.

If you want to reproduce yourself, simply 1) create a "directory" in the AWS console 2) try to sync with s3sync , observe the error about a file path ending with a period and the empty file in the bucket contents.

jesse-ross commented 1 year ago

Can confirm that this happened to me as well with a bucket created through the console. I believe that it would be worth fixing, and would be happy to contribute a fix if this project is still active!