flexera-public / right_aws

RightScale Amazon Web Services Ruby Gems
MIT License
451 stars 175 forks source link

Single-threaded multipart upload support #116

Closed jschneiderhan closed 12 years ago

jschneiderhan commented 12 years ago

I've attempted to add support for the Amazon S3 Multipart Upload API to right_aws. My main motivation is that I would like to store the output of a mysqldump call directly to S3 without writing to disk first. I thought something along the following would be nice:

IO.popen("mysqldump -u root mytestdatabase | gzip") do |pipe|
  key = RightAws::S3::Key.create(bucket, 'mysqldump.sql.gz')
  key.data = pipe
  key.put_multipart(:part_size => 5*1024*1024)
end

The multipart upload API does not require you to know the content's full size at the start of transmission, and if a problem is encountered while sending a particular part it can be retried in isolation instead of failing the entire upload. This can be useful when attempting to send files over unreliable networks.

Before writing this I came across a branch by rgeyer@90df9561 which added support back in January 2011. His approach looks great and has potential to speed up large file uploads dramatically. Unfortunately, it didn't solve my particular use case since it still requires the content size before starting the upload. I also wonder if the fact that is is multi-threaded is preventing the branch from being merged since a commenter pointed out that right_aws is not thread-safe (I'm not sure if this is still the case). I used much of the code in that branch here, but took a single-threaded approach, buffering each part of the input then sending within a loop. The implementation will not be close to as fast as simultaneously uploading multiple pieces at once, but it does allow for transmission to occur immediately (great for reading from pipes or sockets) and to retry failures sending individual parts.

konstantin-dzreev commented 12 years ago

added, thank you