aws / aws-sdk-ruby

The official AWS SDK for Ruby.
https://aws.amazon.com/sdk-for-ruby/
Apache License 2.0
3.55k stars 1.22k forks source link

Multipart uploads don't work with encryption keys #414

Closed mfischer-zd closed 10 years ago

mfischer-zd commented 10 years ago

I'm trying to upload a fairly large file (~ 200GB) to S3 and have encountered a bug where the file stops uploading after posting one ~20MB part and one 5 byte part (2 parts total) if an :encryption_key option is passed to AWS::S3::S3Object#write.

Fails (only ~20MB uploaded):

s3obj.write(Pathname.new(path), :encryption_key => some_256_bit_key)

Succeeds:

s3obj.write(Pathname.new(path))

trevorrowe commented 10 years ago

I just responded on the forums with a question for additional debugging information. I'll copy that response here and link to this issue from the forum post.

I was unable to reproduce the issue myself. I created a 200mb text file locally and uploaded it to S3 using a new key like so:

my_key = OpenSSL::Cipher.new("AES-256-ECB").random_key
s3.buckets['my-bucket'].objects['key'].write(Pathname('data.txt'), :encryption_key => my_key)

The upload succeeded and I was able to then download the same object to a file and the resultant file was identical to the source file. Can you give me a code example of how you generate your key? I would like to try to reproduce this issue so we can fix it.

Also, it would be very helpful to see some logging output. Here is a small snipet that would generate some helpful information:

# start by running the interactive console
$ aws-rb

require 'pathname'

# configure a logger that truncates the data to a manageable size
pattern = "[AWS :service :http_response_status :duration :retry_count retries] :operation(:options) :error_class :error_message\n"
log_formatter = AWS::Core::LogFormatter.new(pattern, :max_string_size => 10)

AWS.config(:log_formatter => log_formatter)

# generate a key, please show an example of how you create the key
my_key = OpenSSL::Cipher.new("AES-256-ECB").random_key

# replace the bucket name, key and path as needed
AWS::S3.new.buckets['aws-sdk'].objects['key'].write(Pathname.new('data.txt'), :encryption_key => my_key)
mfischer-zd commented 10 years ago

Thanks Trevor. For the purpose of this exercise, I using the same key generation method as you:

s3obj.write(Pathname.new(path), :encryption_key => OpenSSL::Cipher.new("AES-256-ECB").random_key)

Here's the debugging output you requested (identifying details redacted):

I, [2013-11-20T20:50:37.697816 #23871]  INFO -- : [AWS S3 200 0.437816 0 retries] initiate_multipart_upload(:bucket_name=>#<String "XXXXX" ... (18 bytes)>,:data=>#<AWS::S3::CipherIO:0x00000001398ba8 @stream=#<AWS::Core::ManagedFile:/XXXXX>, @stream_size=202260596110, @orig_cipher=#<OpenSSL::Cipher:0x00000001398b80>, @cipher=#<OpenSSL::Cipher:0x00000001398b58>, @eof=false, @final=nil>,:key=>#<String "2-1/201311" ... (33 bytes)>,:metadata=>{"x-amz-iv"=>#<String "t4wiRHBc9Y" ... (24 bytes)>,"x-amz-key"=>#<String "h4Ba2p67WY" ... (64 bytes)>,"x-amz-matdesc"=>"{}","x-amz-unencrypted-content-length"=>202260596110})  

I, [2013-11-20T20:50:42.560903 #23871]  INFO -- : [AWS S3 200 4.771689 0 retries] upload_part(:bucket_name=>#<String "XXXXXX" ... (18 bytes)>,:data=>#<String "\xBB^\xCA>\xB0\r\xFF\xBB\xDAI" ... (20226060 bytes)>,:key=>#<String "2-1/201311" ... (33 bytes)>,:part_number=>1,:upload_id=>#<String "eoJMsRZGwP" ... (128 bytes)>)  

I, [2013-11-20T20:50:42.759908 #23871]  INFO -- : [AWS S3 200 0.197626 0 retries] upload_part(:bucket_name=>#<String "XXXXXX" ... (18 bytes)>,:data=>"![!\xE1\xFB",:key=>#<String "2-1/201311" ... (33 bytes)>,:part_number=>2,:upload_id=>#<String "eoJMsRZGwP" ... (128 bytes)>)  

I, [2013-11-20T20:50:43.275303 #23871]  INFO -- : [AWS S3 200 0.514918 0 retries] complete_multipart_upload(:bucket_name=>#<String "XXXXXX" ... (18 bytes)>,:key=>#<String "2-1/201311" ... (33 bytes)>,:parts=>[{:etag=>#<String "\"1f8b9657f" ... (34 bytes)>,:part_number=>1},{:etag=>#<String "\"790ad279f" ... (34 bytes)>,:part_number=>2}],:upload_id=>#<String "eoJMsRZGwP" ... (128 bytes)>)
trevorrowe commented 10 years ago

Looking at the logging output I see the call to initiate_multipart_upload specifies the unencrypted content length as 202260596110 Bytes. This is also reflected in the stream size of the opened file object. That is 192,891 megabytes. That is roughly 188 gigabytes.

You described the file as ~ 200 MB, so I am guessing something is going wrong when it tries to determine the file size of your object. That still doesn't explain why only 20 MB get uploaded before upload is completed.

Does the problem go away if you disable the multipart upload and send the file in a single put request with encryption?

s3obj.write(Pathname.new(path), :encryption_key => my_key, :single_request => true)
mfischer-zd commented 10 years ago

I think you misread; I said 200GB.

With respect to your single-request inquiry, that's not working either:

I, [2013-11-20T21:50:37.116303 #25353]  INFO -- : [AWS S3 200 8.246672 3 retries] put_object(:bucket_name=>#<String "XXX" ... (18 bytes)>,:content_length=>202260596112,:data=>#<AWS::S3::CipherIO:0x0000000273d708 @stream=#<AWS::Core::ManagedFile:XXX>, @stream_size=202260596110, @orig_cipher=#<OpenSSL::Cipher:0x0000000273d578>, @cipher=#<OpenSSL::Cipher:0x00000002c01578>, @eof=false, @final=nil>,:key=>#<String "2-1/201311" ... (33 bytes)>,:metadata=>{"x-amz-iv"=>#<String "wiLT6CzIMu" ... (24 bytes)>,"x-amz-key"=>#<String "1q0ZG6aDTW" ... (64 bytes)>,"x-amz-matdesc"=>"{}","x-amz-unencrypted-content-length"=>202260596110}) Errno::ECONNRESET Connection reset by peer

I can't fathom why S3 is sending us an RST packet mid-upload, but tcpdump confirms it.

trevorrowe commented 10 years ago

You are correct, I miss-read your original statement on the estimated file size.

S3 is killing the upload because it knows it will not accept the full file. It has a hard limitation of 5GB per PUT Object request. Based on the Content-Length given, S3 is choosing to kill the upload before receiving all 200GB. The SDK does have a feature you can enable that will wait for a 100-continue from S3 before sending the payload. This can be when S3 may want to redirect or deny an upload based on the request headers. However it also holds the payload for all requests until a response is received or a timeout expires. In this case, we know why S3 is rejecting the upload.

Based on the part size, I think I know where to look. Give me a few mins and I will see if I can dig up the culprit.

mfischer-zd commented 10 years ago

Interestingly, if I create a 200MB empty file, or even a 200GB empty file via dd if=/dev/zero bs=1M, I can upload the file successfully with both encryption and multipart enabled.

So there appears to be something about the content of the files themselves that is causing the problem. They're gzip-compressed tar files containing MySQL backups, if that's relevant.

trevorrowe commented 10 years ago

Good news, I've narrowed down the bug to the AWS::S3::CipherIO class used to encrypt and stream the file. You can see a repo of the buggy behavior from this gist: https://gist.github.com/trevorrowe/7574548

I'll try to get this fixed shortly. Thank you for your patience.

trevorrowe commented 10 years ago

I've found the bug. It will take a bit of rework on the internals of the CipherIO class to fix for the general case, but there is a really simply workaround. If you specify the preferred minimum part size to a number of bytes that is evenly divisible by 16, then everything should work.

# specify the min part size as an even 20MB
s3obj.write(Pathname.new(path), :encryption_key => my_key, :multipart_min_part_size => 20971520)

You can specify this value as a global default via AWS.config

AWS.config(:multipart_min_part_size => 20971520)

Let me know if this resolves the issue for now. I'll work on coding up a proper fix.

mfischer-zd commented 10 years ago

The workaround is effective, but only if I specify it in the s3obj.write call. It does not appear to be effective if I specify it in AWS.config.

Thanks for the quick diagnosis and workaround!

lsegal commented 10 years ago

@trevorrowe could we not just change the part size picker to choose a number that is divisible by 16? That would at least be a temporary fix in the SDK

trevorrowe commented 10 years ago

I already have that fix in place. Currently the CipherIO class is only used internally and is marked as api private. That said, this workaround/fix only works if the stream consumer, e.g. Net::HTTP, only requests chunks divisible evenly by 16 bytes. This appears to always be the case.

trevorrowe commented 10 years ago

@mfischer-zd I'm sorry. The snippet I gave was incorrect. The correct configuration is:

AWS.config(:s3_multipart_min_part_size => 20971520)

I left off the s3_ prefix.