cloudyr / aws.s3

Amazon Simple Storage Service (S3) API Client
https://cloud.r-project.org/package=aws.s3
382 stars 148 forks source link

s3sync from Bucket to Bucket #279

Open Geoiv opened 5 years ago

Geoiv commented 5 years ago

Please specify whether your issue is about:

cloudyr devs,

First of all, thank you for your work on such a useful project - the aws.s3 package is immensely helpful to me. I have a feature request that could add even further useful functionality.

The version of sync in this cloudyr package synchronizes the files between a local folder and an s3 bucket, but does not appear to have the functionality to synchronize files between two s3 buckets. This functionality is present in aws command line tools, shown here: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Given that the command line tools are largely just wrappers around HTTP requests themselves, I believe it should be possible to implement similarly to your other functionality. I feel this would be an incredibly useful addition to your aws.s3 package.

s-u commented 4 years ago

If you look at the s3sync code , it's entirely manual - there is no actual "sync" HTTP command, even the CLI has to do all the work - figuring out what to copy and then fetching and pushing every single object one by one. So it's really better to use aws s3 sync than to implement the entire complex logic in R. If someone wants to do it, great, I'm happy to look at the PRs, but it's really non-trivial - in particular if you want to do multiple copies in parallel to be fast.