aws / aws-sdk-js

AWS SDK for JavaScript in the browser and Node.js
https://aws.amazon.com/developer/language/javascript/
Apache License 2.0
7.6k stars 1.55k forks source link

Multipart Upload using generated pre-signed URL #468

Closed vkovalskiy closed 9 years ago

vkovalskiy commented 9 years ago

In this article (http://blogs.aws.amazon.com/javascript/post/Tx3EQZP53BODXWF/Announcing-the-Amazon-S3-Managed-Uploader-in-the-AWS-SDK-for-JavaScript) Loren wrote about awesome feature of multipart upload from the browser using aws-sdk.

Most of the time on the browser side what we can get is just a signed PUR URL that was generated on the backend with actual AWS credentials.

Can this multipart upload feature use this pre-signed URL to make an upload? So we won't have to have any credentials from AWS on the client side.

AdityaManohar commented 9 years ago

@vkovalskiy The pre-signed URL that is returned by the getSignedUrl() operation is useful for accessing objects without the SDK.

If you are using the SDK to perform an operation, you will have to authenticate it. You can use Amazon Cognito or Web Identity Federation to vend credentials in the browser.

I hope this helps!

jontelm commented 9 years ago

https://github.com/bookingbricks/file-upload-example

I have only be able to upload a file without any content or if i set Body in the getSignedUrl. Seems like Content-Length is set to 0 as default?

Edit: The problem was Content-Type, works now.

lsegal commented 9 years ago

@vkovalskiy to answer your question specifically, you can theoretically generate signed URLs for multipart uploads, but it would be fairly difficult to do. You could initiate the multipart upload on the backend on behalf of the user, but you would have to generate signed URLs for each individual uploadPart call, which would mean having to know exactly how many bytes the user was uploading, as well as keeping track of each ETag from the uploadPart calls that the user sends so that you can complete the multipart. If you ever implement this I'd be interested to see what you come up with!

AdityaManohar commented 9 years ago

@vkovalskiy I'm closing this out since there hasn't been any activity. Feel free to re-open this issue or open another issue if you have any other questions.

shooding commented 9 years ago

It will work. My application server generates pre-signed url for each uploadPart. Client can upload parts to pre-signed urls and each of them will response ETag. The tricky part is that Content-Type is not required (or say should not exists) in UploadPart requests, which is different from PutObject (If you have tried low-level REST PutObject to presigned url, you will know what i'm saying). Otherwise you will get 403 Forbidden (Signature does not match) when UploadPart. Finally, like @lsegal said, you need to keep track of each ETag and complete the multipart so that you will be able to construct parts information at client-side.

vkovalskiy commented 9 years ago

Thanks @lsegal and @shooding for the info. I'll give it a try - we really need multipart uploads with unstable channels.

musicullum commented 9 years ago

I try the same thing. The one thing that's missing for the server to generate the pre-signed Url for uploadPart is the "operation" parameter. Would that be "uploadPart"? I might try but because it's rather complex, i'd very much apprechiate a solution (if it's only a confirmation that it can actually be done this way). I'm a bit confused why there is almost no information to be found for this particular case. In order to avoid proxying, it appears to be the only way to acheive direct access to large objects without exposing the secret key in a client application, or am I missing something? Thanks for any help!

lsegal commented 9 years ago

@musicullum as pointed out above, it is possible to do, but you must generate a signed request for each operation, meaning you must know how many parts are being sent and the ETags for each individual part. If you know all this, you can sign each request and send it down to your client.

shooding commented 9 years ago

@vkovalskiy and @musicullum Here is a working example of server-side PHP:

//initialize your $s3Client first.
$signedUrl = '';                
$command = $s3Client->getCommand('UploadPart', array(
            'Bucket' => 'yourBucket',
            'Key' => 'yourObjectkey',
            'PartNumber' => 1,
            'UploadId' => 'FromCreateMultipartUpload',    
            //ContentType is not required
            'Body' => '',
        )); 

$signedUrl = $command->createPresignedUrl('+10 minutes');

When you upload part from the client side, do:

curl -v -X PUT -T {local_path_to_your_file_part} '{signedUrl}'

Since -v option is on, you can see HTTP debug information and see the ETag in response. You should make a MD5 of your local file part and compare it to the ETag, or just irresponsibly accept everything AWS replies.

Sorry i didn't use js at the client side, but i'm pretty sure you guys can do that easily with jQuery. In this way you can avoid proxying large files.

musicullum commented 9 years ago

Thanks a lot shooding. I figured out the command methods that I was missing, and so far successully downloaded a file this way from a server generated presigned url. Knowing and understanding how it works, this should now be applicable to the multipart upload sequences. My client is actually a desktop (Qt) application, so having download working this way it should work out. So thanks again, also the missing Content-Type hint is certainly welcome!

vkovalskiy commented 9 years ago

thanks a lot @shooding !

andrewgoodchild commented 8 years ago

Hi Everyone,

I have a partial working solution that provides a rest end point to a client to start a multipart upload and return an upload ID. The client can then use that upload id + partnumber to get a presigned url from another rest end point

The client then does a PUT successfully for each part to S3 (without the content type) and they get a 200 OK + an ETag back. We validate the etag and the value is the same as the MD5 hash we used for upload. Finally, we then get the client to record the upload Ids and Etags. So every thing is good ....

Not quite. The final step is when the issues arise. When we try and complete the multipart upload using the upload Ids and part numbers, s3 rejects the complete request complaining that the parts numbers dont exist. I did a list parts and there are no parts recorded at all for the uploadID, despite the fact that each part upload returned a 200 OK and a valid etag.

any thoughts? @shooding you mentioned you had this working.

musicullum commented 8 years ago

I figured out that the size for a single file upload is sufficient for our needs so I didn't conclude multipart upload, sorry. What I remember is that I would create a handshake sequence between client and server like client asks server to give it a presigned url, server using AWS API returns url, client loads all parts and then requests the server to do the completion via the AWS API again (passing the list of etags). I mean to recall that it worked that way, but don't take my word for it.

andrewgoodchild commented 8 years ago

thanks musicullim.

The process you described is pretty much what we are doing. However, what we are finding is the returned etags dont have part numbers in them and just include the MD5 hash (which is valid for the part - just no part number). If you ask S3 for a list of parts for a multipart uploadid, you find that none of the parts are registered against the upload id. I suspect that uploading parts with presigned urls is not possible in AWS. I know that it is supported for Azure:

http://gauravmantri.com/2014/01/06/create-pre-signed-url-using-c-for-uploading-large-files-in-amazon-s3/.

In the meantime, I have raised a support ticket with AWS to find out if it is possible.

I would like to hear back from @shooding about his experience.

@vkovalskiy I dont know if you are interested in reopening this question?

-Andrew.

shooding commented 8 years ago

It works, but i was using SignatureV2 and not sure if SignatureV4 change anything.

$result = $s3Client->completeMultipartUpload(array(
            'Bucket' => 'yourBucket',
            'Key' => 'yourObjectkey',
            'Parts' => $part->getETags(),
            'UploadId' => 'FromCreateMultipartUpload',
            //ContentType is not required for AWS in this request
        ));
andrewgoodchild commented 8 years ago

thanks for that @shooding

andrewgoodchild commented 8 years ago

I had a peak into the bucket this morning and I saw a bunch of files in S3 with part numbers and upload Ids. I have found that the individual parts are being loaded into s3 as files with names that include part numbers and upload ids. The presigned multipart uploads are being treated as ordinary files and and not as parts of a multipart upload. This explains why during the upload, we are seeing ETags returned without part numbers and in the final step none of the parts are registered to the upload ID

So it appears something has changed in S3. @shooding could get presigned multipart uploads to work with v2 signatures. But now that we are using v4 signatures, the bucket is treating presigned multipart uploads as ordinary presigned uploads.

jeskew commented 8 years ago

@andrewgoodchild How are you generating the presigned URLs? From your description of the ETags returned by S3, it sounds like you're using PutObject URLs instead of UploadPart URLs.

andrewgoodchild commented 8 years ago

The putObject url is identical for both except for query parameters

The S3 presigned Put is:

PUT /ObjectName?AWSAccessKeyId=xxxx&Expires=xxxxx&Signature=xxxxxxx HTTP/1.1 Host: BucketName.s3.amazonaws.com Date: date Content-Length: Size

And the part upload is

PUT /ObjectName?partNumber=PartNumber&uploadId=UploadId HTTP/1.1 Host: BucketName.s3.amazonaws.com Date: date Content-Length: Size Authorization: authorization string

What I did was sign the url /ObjectName?partNumber=PartNumber&uploadId=UploadId to get:

PUT /ObjectName?partNumber=PartNumber&uploadId=UploadId&AWSAccessKeyId=xxxx&Expires=xxxxx&Signature=xxxxxxx HTTP/1.1 Host: BucketName.s3.amazonaws.com Date: date Content-Length: Size

Which in turn was treated by S3 as a file called "/ObjectName?partNumber=PartNumber&uploadId=UploadId"

At this stage I am thinking the presigning using query params might not the the right approach and I am better off generating an authorization string on the server, passing it back to the client and getting the client to do a vanilla REST part upload using the authorization string.

dnewkerk commented 8 years ago

@andrewgoodchild I came across this issue while trying to solve the same problem, in my case using Ruby. Like you, I was using :put_object and was trying to pass in the key in the same "multipart" format I had used for the signed authorization string (including partNumber and uploadId), resulting in multiple files in my bucket, one for each part and named with the partNumber and uploadId.

I don't yet know much about the JavaScript SDK, however hopefully the same solution can be applied. I asked about the issue in https://gitter.im/aws/aws-sdk-ruby and @trevorrowe pointed me to a solution that worked for me. For reference, here was his reply:

I've never attempted to pre-sign a multipart upload. I suppose it might be possible. I would attempt something like this:

presigner = Aws::S3::Presigner.new
url = presigner.presigned_url(:create_multipart_upload, ...)
# initiate the upload and pull the upload id from the response
presigner.presigned_url(:upload_part, bucket:'name', key:'key', upload_id:'id', part_number: 1)
presigner.presigned_url(:upload_part, bucket:'name', key:'key', upload_id:'id', part_number: 2)
# etc, you need to capture the etag and part number from each of these requests
presigner.presigned_url(:complete_multipart_upload, ...)

That said, I'm curious what your use case is for pre-signed multipart uploads from the browser. I'm wondering if there might be an easier way. You can use a presigned-POST from the browser. It will not support objects larger than 5GB, but it would much simpler.

The ah-ha moment for me was from the presigner.presigned_url(:upload_part, bucket:'name', key:'key', upload_id:'id', part_number: 1) line. I applied the idea in my own code, and browser-based multipart uploads using presigned URLS worked! This seems to be what @jeskew was alluding to in the above comment as well.

I hope this helps!

andrewgoodchild commented 8 years ago

@dnewkerk Thanks for the heads up. I had a look at it and the solution references a gist (https://gist.github.com/dnewkerk/ff1bcebf83fb2f1b58b9) which constructs an authorization header for the client to send as part of an upload.

Thought works has a blog on the issue (https://www.thoughtworks.com/mingle/infrastructure/2015/06/15/security-and-s3-multipart-upload.html), which highlights they bumped into the same problem with pre-signed multipart urls and now they dont presign and get the server end point to generate an authorisation to add to the header.

Lastly, I have been tick tacking with AWS support and on their console they are seeing presigned multipart uploads being treated as single part uploads.

So while presigned multi-part uploads may have been possible in the past, it seems they are no longer supported now. Either the feature has been silently deprecated or it fell through the cracks when AWS upgraded S3 at some point.

So for now I am going to work on a solution that provides a rest end point to generates authorisation headers for the the client to include in their upload.

dnewkerk commented 8 years ago

@andrewgoodchild apologies if I was unclear, though on my end presigned multipart uploads are working. The gist you mentioned was me sharing with trevor how I had previously coded the authorization header before getting presigned multipart uploads working.

This is my new version of the (ruby) code that works for getting presigned urls for both single and multipart uploads. Hopefully seeing this will give you ideas for how to solve it similarly in your code:

presigner = Aws::S3::Presigner.new
if upload.multipart?
  presigner.presigned_url(:upload_part, bucket: upload.bucket, key: upload.filename,
    upload_id: upload.multipart_id, part_number: upload_part.part_number)
else
  presigner.presigned_url(:put_object, bucket: upload.bucket, key: upload.filename)
end

Notice how the first parameter of the presigned_url method is different between single and multipart uploads, and that multipart uploads take the regular key (filename) just like single part uploads, but also add parameters for setting the upload_id and part_number.

Hope this helps :)

andrewgoodchild commented 8 years ago

Thanks @dnewkerk. The docs for current javascript sdk doesn't seem to have the extra parameters for method, upload id and part number you have used in the ruby sdk.

see more here:

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#getSignedUrl-property

So I tried jamming the extra parameters in there anyway: ... var params = { Bucket: result.value.bucket, Key: location, PartNumber: partNumber, UploadId: uploadId }; s3.getSignedUrl('uploadPart', params, function(err, url) { ...

And the AWS javascript SDK seems to have swallowed it. And gosh dang it - it works. I can complete a presigned multipart upload ......

Thank you again @dnewkerk

Merry Xmas everyone !!!

vhmth commented 8 years ago

Have y'all used EvaporateJS? https://github.com/TTLabs/EvaporateJS

You supply an endpoint on your server that returns a signed url. It asks for a signed URL for each part and handles keeping track of the ETags, aborting, canceling, etc.

andrewgoodchild commented 8 years ago

I had a look at it, but I had some additional requirements. So I used it for ideas.

BenjaminPoilve commented 8 years ago

Weir, @andrewgoodchild, doing the same as you returns an error for me:

error { [MultipleValidationErrors: There were 2 validation errors:
* UnexpectedParameter: Unexpected key 'PartNumber' found in params
* UnexpectedParameter: Unexpected key 'UploadId' found in params]
  message: 'There were 2 validation errors:\n* UnexpectedParameter: Unexpected key \'PartNumber\' found in params\n* UnexpectedParameter: Unexpected key \'UploadId\' found in params',
  code: 'MultipleValidationErrors',
  errors: 
   [ { [UnexpectedParameter: Unexpected key 'PartNumber' found in params]
       message: 'Unexpected key \'PartNumber\' found in params',
       code: 'UnexpectedParameter',
       time: Fri May 20 2016 19:54:39 GMT+0200 (CEST) },
     { [UnexpectedParameter: Unexpected key 'UploadId' found in params]
       message: 'Unexpected key \'UploadId\' found in params',
       code: 'UnexpectedParameter',
       time: Fri May 20 2016 19:54:39 GMT+0200 (CEST) } ],
  time: Fri May 20 2016 19:54:39 GMT+0200 (CEST) }

Does this feature still works for you? Or is it down? Because no matter how hard I try, those parameters don't seems to be expected by amazon..

BenjaminPoilve commented 8 years ago

Well, trying to make it work, I can now get a signed Url for each chunk.. I still get a 403 error. It looks like I am missing something.. More in this repo.

BenjaminPoilve commented 8 years ago

Well I didn't found out what caused my bug, but I recommand this repo! Works perfectly. Great work by @Yuriy-Leonov

abuisine commented 8 years ago

I would suggest to have a look at mule uploader, it allows a kind of pre-hash on your server side, and is quite robust.

BenjaminPoilve commented 8 years ago

In the end, we found this repo that works quite well and I had a friend push it to npm

FlorinDavid commented 7 years ago

@BenjaminPoilve I had the same issue, I've got the same error when I've tried to sign an upload part

* UnexpectedParameter: Unexpected key 'PartNumber' found in params
* UnexpectedParameter: Unexpected key 'UploadId' found in params]

My mistake was that I've used the putObject operation instead of uploadPart operation The right call is:

s3Instance.getSignedUrl('uploadPart', {
  Bucket: 'test-multipart-upload-free',
  Key: '<your_file.ext>',
  UploadId: '9PSnW_U3EgpbV8lmOQR8...',
  PartNumber: <part_number>
}, (error, presignedUrl) => {... //use the url to upload the file});)
oyeanuj commented 7 years ago

@lsegal @AdityaManohar Just following up to see if this is possible today since the comments on the issue are about a couple of years old?

tomasdev commented 5 years ago

It is definitely possible as of November 27th, 2018.

https://github.com/aws/aws-sdk-js/issues/1603#issuecomment-441926007

prestonlimlianjie commented 5 years ago

@oyeanuj I've created a functioning demo repo with multipart + presigned URL uploads from the browser: https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload

fadelafuente1 commented 5 years ago

@shooding thanks for that advise, that was my problem. Now I can upload files with presigned url and multipart upload. I will try to post a medium post with the full process of multipart upload with presigned url.

@shooding : "The tricky part is that Content-Type is not required (or say should not exists) in UploadPart requests, which is different from PutObject (If you have tried low-level REST PutObject to presigned url, you will know what i'm saying)."

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.