FineUploader / fine-uploader

Multiple file upload plugin with image previews, drag and drop, progress bars. S3 and Azure support, image scaling, form support, chunking, resume, pause, and tons of other features.
https://fineuploader.com
MIT License
8.19k stars 1.87k forks source link

5 - Support uploads to S3 via CloudFront distribution #1016

Closed rnicholus closed 8 years ago

rnicholus commented 11 years ago

The current plan is to support simple (non-chunked) uploads to a CloudFront distribution. Chunked uploads are currently not possible when targeting a CloudFront distribution since CloudFront rips off the Authorization header containing the signature before forwarding the request on to S3. The Authorization header is a required field in the request when using any of the S3 multipart upload REST calls, which are needed to support Fine Uploader S3's chunking and auto-resume features. I have opened a request in the CloudFront forums asking for this behavior to be modified so multipart uploads requests can target a CloudFront distribution.

The planned support for this is still mostly undetermined, and it is also not determined if this support will be part of 4.0. Making this part of 4.0 is possible, but looking less likely as I run into issues with CloudFront's handling of upload-related requests. I'm currently struggling to get path patterns for upload requests to work. I've opened another thread in the forum detailing my issue at https://forums.aws.amazon.com/thread.jspa?threadID=137627&tstart=0.

rnicholus commented 11 years ago

No useful responses from Amazon in the forums. This is going to be postponed until a later release.

rnicholus commented 10 years ago

Amazon has confirmed that they have no plans to stop removing the Authorization header from requests. This means that we will not be able to make use of the multipart upload API via a CloudFront distribution.

jasonshah commented 10 years ago

So, is it fair to say that simple uploads to Cloudfront are not yet supported? We are evaluating uploaders, and one of our customers wants to upload huge (multi-GB) files. Uploading to S3 with Fine Uploader is proving to be severely limited by their proxy, and I wanted to try uploads straight to Cloudfront.

rnicholus commented 10 years ago

The only possible way to upload files that target a CloudFront distribution is to send the entire file in one multipart encoded POST request. This means that chunking, resume, and pause features are not possible. If a file fails mid-upload, each retry attempt will need to start over from the first byte. In fact, no credentialed REST calls are possible at all, since AWS rips off the Authorization header.

If you do not enable the chunking/resume features in Fine Uploader S3, theoretically, uploads to a CF distribution should work, but there is another problem: Fine Uploader S3 expects to be able to determine the bucket name from your endpoint. If you are uploading to a CF distribution, this assumption is no longer valid. We would have to allow a bucket name to be specified via an option and API method for CF distro endpoints.

rnicholus commented 10 years ago

@jasonshah What sort of limits does your customer's proxy enforce?

jasonshah commented 10 years ago

@rnicholus the customer's proxy has some kind of bandwidth throttling in place. In one location, they get 3.5MB/s up (which is very acceptable), whereas in another they get <100K/s up (which isn't). Running some traceroutes and nslookups against the slow computer reveals a proxy server in the way, which is likely limiting bandwidth in some way. We've asked to see if there's a way to get in touch with their network engineers, but it's a huge company and they treat that as a last resort. So one theory is that perhaps uploading to Cloudfront might help speed this up.

rnicholus commented 10 years ago

@jasonshah If uploading to CF fixes the issue, I can see why that might be appealing. The reason I haven't pursued support for CF in Fine Uploader S3 is due to a few reasons:

jasonshah commented 10 years ago

@rnicholus that last issue seems to be the biggest one. The feature was announced four months ago, however I've yet to find a single example of how this might work from them (though perhaps I haven't looked hard enough).

Thanks for thinking of it. We'll keep looking to see if we can solve this problem another way.

jasonshah commented 10 years ago

@rnicholus Some interesting data, FYI: our customer demonstrated that using the Fine Uploader test to upload from NYC to US East (Virginia), he can achieve ~3-7MB/s. Using another product's uploader, which does a simple PUT to a CDN (EdgeCast in this case), he can achieve 55MB/s. The CDN can provide huge speed increases.

rnicholus commented 10 years ago

I'm afraid I'm not familiar with this CDN. Is the customer cutting AWS out of the picture entirely?

pulkitjalan commented 10 years ago

Amazon have now said that cloudfront does not remove the Authorization header on PUT, POST, PATCH, DELETE, and OPTIONS.

https://forums.aws.amazon.com/message.jspa?messageID=528729#528729

rnicholus commented 10 years ago

Excellent. Thanks for the update. That should make it possible for us to modify Fine Uploader S3 to allow uploads through a CF distribution, theoretically.

pulkitjalan commented 10 years ago

I was testing this with fine uploader and I ran into another issue. It was to do with the fact that cloudfront adds the 'X-Amz-Cf-Id' header to the request. I got past this issue by using Origin Access Identities as outlined in this forum post: https://forums.aws.amazon.com/thread.jspa?messageID=345913&#345913.

Looking forward it seeing this feature in fine uploader :)

jasonshah commented 10 years ago

@pulkit-clowdy after setting up the OAI, you were able to use FineUploader to upload to S3 via CloudFront? If yes, did you experience any performance improvements?

pulkitjalan commented 10 years ago

I was able to upload via cloudfront and use chunking. Yes there was a significant improvement in performance considering my bucket is in the us-east-1 region and im uploading from the UK.

Direct to S3 = ~1MB/s S3 via Cloudfront = ~4-6MB/s

pulkitjalan commented 10 years ago

At the moment its quite hacky to get this working, almost all security checks have to be disabled. Is this feature going to be implemented into fineuploader and if so, which version it is planned?

rnicholus commented 10 years ago

Probably 5.1. 5.0 is currently in development.

On Sunday, March 23, 2014, Pulkit Jalan notifications@github.com wrote:

At the moment its quite hacky to get this working, almost all security checks have to be disabled. Is this feature going to be implemented into fineuploader and if so, which version it is planned?

Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/1016#issuecomment-38382491 .

pulkitjalan commented 10 years ago

Ok, thanks for the update

cybertrand commented 10 years ago

First of all, thank you @rnicholus for building such a useful piece of software. I'm also looking to use Fine Uploader to upload content to S3 via CloudFront. I understand from this thread that this should be released in 5.1 and was wondering if you had an idea when that might be?

The company I work for is looking to implement a Web uploader with this specific feature. We might be able to contribute to the project to help develop it if that's something you're interested in.

rnicholus commented 10 years ago

Thanks for the kind words @cybertrand. Fine Uploader wouldn't be where it is today without my employer, Widen, and the development help of @feltnerm along with the input of @uriahcarpenter as well.

You are correct that this feature is scheduled for 5.1, along with several others. I don't think this will be terribly difficult, assuming there aren't any further hidden obstacles (such as the issue where CF stripped Authorization headers a while back, making this impossible - since fixed).

We are currently working on a hotfix and some administrative (non-feature) tasks at the moment. Once those are complete, #1198 is first in line, followed by uploads to CF.

I suspect that the code changes to Fine Uploader S3 will be minimal to support uploads to CF. One thing that will need to change is the code that uses the AWS endpoint to determine the bucket name. You see, we must embed the bucket name in the request, and, with uploads directly to S3, we can programmatically determine the bucket name simply by looking at the S3 endpoint URL. With uploads to a CF distro, that will no longer be a safe assumption, so we will need to solicit the actual bucket name from integrators via an additional option (and provide an API method for dynamic adjustment).

cybertrand commented 10 years ago

Thank you very much for the additional information @rnicholus; makes sense about the S3 bucket name. It's great to hear that this should be straightforward to implement and that it should happen soon. On that last note, are you able to share any rough timeframes: are we talking about 1, 3, 6 months? I'm asking so that we can make the best decision on waiting vs. implementing now.

I was also wondering: will the implementation of this feature include the ability to upload multiple chunks/parts of the same file in parallel? (to S3 via CloudFront). That's what we're after in order to accelerate file uploads and this would provide a really nice commodity solution, instead of having to buy and implement a UDP based transfer acceleration solution. Note: it would be really useful to have a similar solution for accelerated downloads, whereby your JS client would perform multiple HTTP Range requests to CloudFront/S3 in parallel do download a single large file (I believe both S3 and CloudFront support this).

Thanks again!

rnicholus commented 10 years ago

Have you read about the concurrent chunking feature we released in 5.0? That allows multiple chunks for a single large file to be uploaded in parallel to any endpoint. That feature is aimed at single-file large uploads, since multiple chunks have always been uploaded in parallel (one per file) if multiple files are selected. http://docs.fineuploader.com/branch/master/features/concurrent-chunking.html

On Wed, Jun 25, 2014 at 3:22 PM, cybertrand notifications@github.com wrote:

Thank you very much for the additional information @rnicholus https://github.com/rnicholus; makes sense about the S3 bucket name. It's great to hear that this should be straightforward to implement and that it should happen soon. On that last note, are you able to share any rough timeframes: are we talking about 1, 3, 6 months? I'm asking so that we can make the best decision on waiting vs. implementing now.

I was also wondering: will the implementation of this feature include the ability to upload multiple chunks/parts of the same file in parallel? (to S3 via CloudFront). That's what we're after in order to accelerate file uploads and this would provide a really nice commodity solution, instead of having to buy and implement a UDP based transfer acceleration solution. Note: it would be really useful to have a similar solution for accelerated downloads, whereby your JS client would perform multiple HTTP Range requests to CloudFront/S3 in parallel do download a single large file (I believe both S3 and CloudFront support this).

Thanks again!

— Reply to this email directly or view it on GitHub https://github.com/Widen/fine-uploader/issues/1016#issuecomment-47153458 .

cybertrand commented 10 years ago

I did read about it @rnicholus and was wondering if it would be supported for uploads to S3 via CF and for a single file specifically.

I ran some tests doing multipart uploads of single large files to S3 with 3-5 chunks in parallel via a native client app (Cyberduck, also open source) from Los Angeles to a S3 bucket in US Standard and can vouch for the fact that it yields huge gains in throughput (consistently reaching 90+Mbps on a dedicated 100Mbps). Similarly, although a single file/chunk upload to CF will yield much better throughput than one to S3 (since it's uploading to a local CF node), transfers could be further accelerated by allowing multiple chunks in parallel. I don't know how hard this would be to implement and if it requires a lot of changes compared to what you built for S3, just some food for thoughts..

I can also say from my experience that your call to only implement this feature with chunking in order to allow pause/resume as well as the automatic retry of failed upload was a good call. We've had recurring issues uploading single large files to S3 via another CDN: some uploads to S3 fail, our CDN provider confirms a failed PUT to S3 but we're not get any additional information from Amazon...

Sorry, don't mean a nag, but are you able to share any timeframe for the release of 5.1? Thanks a lot for responding so quickly to everything!

rnicholus commented 10 years ago

The concurrent chunking feature was implemented as a core feature in Fine Uploader 5.0. This means that it is supported for all endpoint types: Azure, S3, custom/traditional. I see no reason why a CF distro would be a special case.

I don't yet have a time estimate for 5.1. Stay tuned, and I'll try to post it when I know more.

cybertrand commented 10 years ago

Great. I'll watch the thread for updates on 5.1. Thanks again!

rnicholus commented 9 years ago

Starting work on this now as part of 5.1.0.

rnicholus commented 9 years ago

@jasonshah @pulkitjalan @cybertrand I expect to be done with this in the very near future. Are any of you interested in testing out the pre-release version to ensure it works with your setup?

jasonshah commented 9 years ago

@rnicholus Our engineering team is super backlogged through the end of the year, but I'll try to see if we can squeeze in some tests. We're really excited about it!

rnicholus commented 9 years ago

@jasonshah Thanks for the update. Please watch this issue for progress. I'll comment here when I'm done, and will include the pre-release version number.

rnicholus commented 9 years ago

Still working on this. It turns out that CloudFront is really very poorly designed and documented. Anything other than a trivial default behavior associated with a GET requests requires fair amount of googling and trial & error to perfect. Configuring your behaviors, origins, and associated permissions is maddening. Once I figure out myself how CloudFront works in the context of POST and PUT uploads given a variety of behaviors, I'll begin updating the code.

At the moment, it looks like anything other than an extremely trivial use case (default behavior, one bucket) will require a good understanding of CloudFront by the integrator as Fine Uploader will need to be updated with the proper bucket on-demand, depending on the CloudFront path. I'll probably add an objectProperties.bucket option that take either a string or a function, similar to the key property.

rnicholus commented 9 years ago

After being contacted by some users today via email, it seems as if the progress on this issue is not entirely clear, probably due to the out-of-date limitations in the initial description of this feature case.

Let me be clear:

Once I solve the problem described in my last point above, I'll begin committing changes to the code to support uploads to CF. There will be some documentation as well, but mostly in the context of typical, simple Fine Uploader workflows.

rnicholus commented 9 years ago

I have the following post in the CloudFront forum, waiting for a response from the CF team. https://forums.aws.amazon.com/thread.jspa?messageID=588697

andrew-kzoo commented 9 years ago

Thank you @rnicholus for your deep and detailed technical knowledge! Your dedication and steadfastness in solving these issues is appreciated!

rnicholus commented 9 years ago

No problemo @andrew-kzoo. Thanks for the recognition!

I just posted a first commit for this feature in a feature branch. The goal is to not only support uploads to S3 via CF, but via any CDN that supports POST/PUT/CORS (and doesn't rip off or add headers).

See the commit message for details.

rnicholus commented 9 years ago

Unfortunately I have run into yet another issue with CloudFront's support for uploads to S3. CF doesn't allow any headers, other than CORS headers, to be forwarded to an S3 origin via a CF distro. This means that we can't specify an ACL, encryption params, reduced redundancy, custom metadata, or anything else sent along as a header for a chunked file (via the S3 REST API). This also means that, in its current state, uploads to S3 via CF can't be done "serverless" via an identity provider since we must send a session token in an x-header.

I've opened a case here: https://forums.aws.amazon.com/thread.jspa?threadID=167234.

rnicholus commented 9 years ago

Just noticed that I already posted something in the CF forums detailing an issue I was having with POSTs to non-default behavior endpoints back in 2013. No response from AWS. https://forums.aws.amazon.com/message.jspa?messageID=496076. It's unlikely that AWS will respond to any of these issues, since their participation in the forums is inconsistent at best. Most likely, if the CF uploads feature goes out with Fine Uploader 5.1, there will be a lot of warnings in the docs detailing all of the items that are not properly supported by the upload CF workflow due to the shortcomings of this AWS service in this context.

rnicholus commented 9 years ago

Since CloudFront presents very specific challenges to the upload workflow, I've created a new case to address uploads to S3 via a generic CDN that forwards all headers. CloudFront is still not quite ready to handle uploads to S3 it seems, so this may be delayed until AWS gets its act together. Other CDNs like fastly handle uploads to S3 without issue.

rnicholus commented 9 years ago

I expect to have support for CDNs ready to beta-test by Monday, maybe earlier. Anyone who still wants to try this out early, let me know. I am unable to work around the various issues I'm seeing with uploads to CloudFront, so, at this point, I don't expect to declare that uploads to CF are supported. However, the changes I made here should work with uploads to CF, provided AWS behaves more like a transparent proxy.

I would encourage users to try out the pre-release of 5.1.0 with CDN support against CloudFront and report their results. If something can be figured out soon to deal with the CF issues I'm seeing, I can report that 5.1.0 supports uploads to CF. Otherwise, uploads via any CDN that forwards headers will work just fine with the changes I made, provided the bucket is specified, and the CF-specific case will be put back into limbo.

rnicholus commented 9 years ago

If anyone is interested in testing CDN support before 5.1.0 is released. Please let me know. The code is part of the develop branch now, and the in-progress section of the docs on docs.fineuploader.com has been updated as well. I encourage you to try against CloudFront as well, in hopes that you will have more luck than me.

cdanzig commented 9 years ago

@rnicholus in the process of exploring chunked uploading to S3 via CloudFront. Were you ever able to get the pieces of the puzzle together?

rnicholus commented 9 years ago

Uploads to CloudFront should work, but there appear to be some issues with CF ripping some headers off the REST requests that prevented me from certifying support for CF. Amazon never acknowledged the issue though, even though I am able to upload files to S3 via a 3rd-party CDN without issue.

rbliss commented 9 years ago

@rnicholus Talked to the CloudFront guys directly. They're now saying that ALL the headers should pass through appropriately. Can you still recreate the issue or have a curl command to recreate the issue?

rnicholus commented 9 years ago

They told me the same thing a little while back. It isn't true, as far as I can tell. I opened up a support request in December, detailing the issue. The sum it up, sending upload requests to S3 using the multipart upload API via CloudFront still results in removed headers. For example, x-amz-acl and all x-amz-meta headers are removed. Sending the exact same request directly to S3, or via a different CDN (such as fastly) did not result in any problems. The issue was specifically with CloudFront.

cmwright commented 9 years ago

It doesn't help anyone trying to use CloudFront, but we were able to get uploads via CDN working with Edgecast if someone is looking for a Fastly alternative.

rnicholus commented 9 years ago

Yes, uploads to S3 using Fine Uploader S3 should work via any CDN, provided the CDN does not remove any headers.

rbliss commented 9 years ago

@rnicholus Cool, yeah, seems like it has been a bit frustrating getting this worked out with CloudFront. I'm in direct contact with them, if you can tip me off with an example for re-creating the issue, I can follow up with them.

rnicholus commented 9 years ago

@rbliss Reproducing the issue simply involves setting up Fine Uploader S3 as per usual, specifying a bucket via the new objectProperties.bucket option, and setting the endpoint to your CF distro. You'll see that any custom metadata (sent as x-amz-meta headers) are sent by Fine Uploader, but never actually attached to the object in S3. Since the x-amz-acl header is stripped as well, any specified ACL value via Fine Uploader is ignored and is not attached to the object in S3.

I haven't tested again since December. Perhaps AWS has finally fixed this?

rbliss commented 9 years ago

@rnicholus I'll give it a whirl and get back to you on this.

rbliss commented 9 years ago

@rnicholus Good news! I believe I've gotten to the bottom of the issue, and am just waiting on hearing back from the Cloudfront guys.

Appears the issue isn't a problem of Cloudfront failing to pass through headers, but of the S3 object's ACL properties being setup incorrectly when the object is both created through a Cloudfront endpoint and the ACL is set specifically to private.

The test is straightforward: in FineUploader, try using a Cloudfront endpoint but setting the objectProperties.acl to public-read instead of FineUploader's default private. The object should be created properly, with the appropriate passed through meta data showing up on the object.

For some reason, only the multipart REST API uploads to S3 through Cloudfront with an ACL of private creates an improperly permissioned object where the owner does not have permission to access the object (except strangely, deleting it). Hence the appearance that the meta data headers are not passed through, because you are not allowed to read the meta data!

Anyway, try that and tell me if I'm crazy. I do not know if other ACLs have similar problems.

rbliss commented 9 years ago

Note: you can lookup the ACL setting on an object with:

aws s3api get-object-acl --bucket <bucket> --key <object key>