13 - Support multiple simultaneous chunk uploads

rodneytamblyn commented 11 years ago

For resumable uploads where the upload has been divided into chunks, support simultaneously uploading multiple chunks for a single upload at the same time (to increase upload speed)

rnicholus commented 11 years ago

Hi Rodney. Putting aside the complexities of supporting this for now, I guess I don't see how uploading multiple parts simultaneously would decrease the total upload time.

Let's say you have a 10 Mbps upstream pipe and you are uploading a 10 MB file. Normally, this file, if using your entire pipe (and ignoring any other overhead) could be uploaded in 8 seconds. Now, if we split it into, say, 2 chunks of 5 MB each, and send each chunk one after the other, the 1st chunk uploads in 4 seconds, and the 2nd 4 more seconds, for a total of 8 seconds. Now, if we send both chunks at once, they are still both sharing the same 10 Mb pipe, so I don't see how the total upload time could magically decrease to less than 8 seconds.

Am I missing something?

alexmcmillan commented 11 years ago

What if the client has a 100mbps upstream pipe, but a loadbalanced server setup is under strain and can only receive 20mbps per server?

rnicholus commented 11 years ago

Yes, I will agree that this particular scenario would benefit from sending multiple chunks at once. However, the exact same benefit can be obtained by setting the maxConnections option to 5 and allow up to 5 files to upload simultaneously as well. It seems like a safe bet that most, if not all, web apps can expect to have users that select/drop multiple files at a time or nearly at the same time. In that case, why not just allow 5 files to upload at once? It's already possible with the current version of Fine Uploader (and has been possible for a long time). Is there something specific about your web app that would not make this a simpler and just-as-effective solution?

rodneytamblyn commented 11 years ago

We are trying to support users who may want to upload very large files such as mp4 videos or pathology images that could easily range into the hundreds of mb range. Using chunking and load balancing we can upload files up to 500Mb successfully. Multi-chunk upload would improve the efficiency of these uploads...

Rodney

Sent from my iPhone

On 8/08/2013, at 3:26 PM, Ray Nicholus notifications@github.com wrote:

Yes, I will agree that this particular scenario would benefit from sending multiple chunks at once. However, given the specific scenario you have outlined. However, the exact same benefit can be obtained by setting the maxConnections option to 5 and allow up to 5 files to upload simultaneously as well. It seems like a safe bet that most, if not all, web apps can expect to have users that select/drop multiple files at a time or nearly at the same time. In that case, why not just allow 5 files to upload at once? It's already possible with the current version of Fine Uploader (and has been possible for a long time). Is there something specific about your web app that would not make this a simpler and just-as-effective solution?

— Reply to this email directly or view it on GitHub.

rnicholus commented 11 years ago

I'm going under the assumption that your users are going to upload more than one file at a time. Let me know if this is not true, and please provide details about your web app or customer base that would suggest this is the norm among users of your app.

With that in mind, let's go with @alexmcmillan's example again of a 100 Mb pipe with a load balance in front of, say, 5 servers with a maximum throughput of 20 Mbps per server. The load balancer distributes requests appropriately.

Let's say a user submits or drops 5 files. What is the difference between uploading those 5 files simultaneously, but sending the chunks for each individual file consecutively, versus sending 5 chunks of the first file in parallel, then doing the same with the next file once the first file is done, etc? It's not clear to me what the advantage of the latter approach is, especially since the former approach is already supported by Fine Uploader.

Also, keep in mind that all browsers impose a maximum number of requests that can be executing per hostname at once. Chrome and most modern browsers have a limit of 6, I believe.

rodneytamblyn commented 11 years ago

Hi Ray,

Thanks for the responses. A few more comments:

A few things to keep in mind here:

I'm going under the assumption that your users are going to upload more than one file at a time. Let me know if this is not true, and please provide details about your web app or customer base that would suggest this is the norm among users of your app.

They might upload multiple files, or they could be uploading a single, large file. OB3 is a web application for online academic study. It provides on-screen documents and users can drag and drop content from their desktop to the document. So this could be a bunch of separate files (e.g. images, file attachments) or it could be a single large file (e.g. a 500Mb mp4 video) With that in mind, let's go with @alexmcmillan's example again of a 100 Mb pipe with a load balance in front of, say, 5 servers with a maximum throughput of 20 Mbps per server. The load balancer distributes requests appropriately.

Our backend comprises OB3 web application servers (NodeJS) running on multiple server instances. Currently (in development mode) we are running 4 OB3 servers on 2 servers (running in Rackspace) behind a load balancer. Uploaded chunks may be sent (by the LB) to separate instances, so we keep track of the chunks in our database and when all chunks have arrived we reassemble the file and shift it to CloudFiles for distribution. This is now working and we have successfully tested uploading files up to 500Mb in size.
Let's say a user submits or drops 5 files. What is the difference between uploading those 5 files simultaneously, but sending the chunks for each individual file consecutively, versus sending 5 chunks of the first file in parallel, then doing the same with the next file once the first file is done, etc? It's not clear to me what the advantage of the latter approach is, especially since the former approach is already supported by Fine Uploader.

IN this specific instance you'd be right. However they could drag in more than 5 files… so in this situation we'd be looking at potentially queueing files. This will be the next evolution for us, implementing some type of upload manager.

There are two specific scenarios - the user is uploading several (or lots) of relatively small files, and the scenario where they are uploading one or more large to very large files (typically video).

At the more extreme edge scenario - we've had users needing to upload Gbs of files - e.g. pathology images which can be up to 1Gb for each file...

Also, keep in mind that all browsers impose a maximum number of requests that can be executing per hostname at once. Chrome and most modern browsers have a limit of 6, I believe.

Thanks - and yes we're aware of that. This can be solved by running separate upload domains.

— Reply to this email directly or view it on GitHub.

feltnerm commented 11 years ago

IN this specific instance you'd be right. However they could drag in more than 5 files… so in this situation we'd be looking at potentially queueing files. This will be the next evolution for us, implementing some type of upload manager.

It's not a solution, and a bit of an unrelated idea, but ...

I wonder if your upload manager could compress files using zip.js or something similar. This would be quite useful while files are queued since they aren't doing anything to the network and CPU time is not really being used.

As for implementation you could find a way to add all files to be uploaded to a single zipped file, and then upload that zipped file, or zip each file individually before sending, or even zip each chunk before sending (going to look into the possibility of this).

The logic for determining when to zip or not (if you want it to be automatically detected and not set by the user) may be non-trivial.

rodneytamblyn commented 11 years ago

It's an interesting idea, but in the case of large files presumably the data is already highly compressed, so you may not get a lot of benefit in terms of reduced sizes. Also I wonder what the performance hit of zip.js would be like on browser.

rodneytamblyn commented 11 years ago

Michael, our lead developer, has come back with a couple of reasons why we want multi-chunk simultaneous upload:

Missing the time between chunks and load balancing. We want to support smaller chunks for better resume support so spitting into 10 means a pause 10 times for x seconds... if you've ever watched a download it does not start at the max speed it can handle, it tends to work up to it, eg. it will not hit its peak for 10Mbps straight away, if you split it into 10 then it stops 10 times meaning it can't reach its peak.

This also means you cannot provide higher speed uploads by scaling with parallel servers, say you have 10mbps upstream per server but the client has 100mbps, say we have (for point of argument) 20 servers to handle load and no one else happens to be uploading. Even though we have 200mbps capacity the best the client uploading can hope for is 10mbps because that is the max that (single) server could offer it.

micahnz commented 11 years ago

maxConnections only benefits multiple file uploads it is absolutely fine for that situation, the situation where parallel chunk upload would benefit would be for large file uploads therefore maxConnections does not solve the feature request.

The above two situations are the areas that this feature would benefit.

Parallel chunk uploading would allow uploads to reach higher speeds since a 1MB chunk is so quickly uploaded on a high speed connection it never reaches its peak speed.

It would also make scaling in large deployments more effective as above, the situation of limited speed per server or a server that is already under heavy load means the user is stuck uploading at whatever that single server can give it, allowing parallel uploads means the user has better luck of attaining greater speeds for a single large file Upload.

maxConnections does not solve the problem up uploading a 700Mb video as that file will always be limited to the upstream speed of a single server, maxConnections is only helpful for uploading lots of small files that probably don't require chunking anyway.

Bandwidth is an issue for scaling, this would help solve that problem... it is more cost effective to deploy lots of small servers to increase capacity rather than using larger single servers with massive upstream capacity.

rnicholus commented 11 years ago

Ill address your points when I have more time, but keep in mind that you can only realistically upload about 4 chunks in parallel. Also, parallel chunk uploading will add a non trivial amount of complexity to the code base.

On Thursday, August 8, 2013, Michael Mitchell wrote:

maxConnections only benefits multiple file uploads it is absolutely fine for that situation, the situation where parallel chunk upload would benefit would be for large file uploads therefore maxConnections does not solve the feature request.

The above two situations are the areas that this feature would benefit.

Parallel chunk uploading would allow uploads to reach higher speeds since a 1MB chunk is so quickly uploaded on a high speed connection it never reaches its peak speed.

It would also make scaling in large deployments more effective as above, the situation of limited speed per server or a server that is already under heavy load means the user is stuck uploading at whatever that single server can give it, allowing parallel uploads means the user has better luck of attaining greater speeds for a single large file Upload.

maxConnections does not solve the problem up uploading a 700Mb video as that file will always be limited to the upstream speed of a single server, maxConnections is only helpful for uploading lots of small files that probably don't require chunking anyway.

Bandwidth is an issue for scaling, this would help solve that problem... it is more cost effective to deploy lots of small servers to increase capacity rather than using larger single servers with massive upstream capacity.

— Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/937#issuecomment-22368016 .

micahnz commented 11 years ago

No worries, I am aware of the complexities but we wanted to raise this as it is something we would like in the near future so would have to find a solution. 4 chunks in parallel would make a difference so it would be absolutely fine... BUT you can get around the browser limits fairly easily by using multiple domains. eg. upload1.yourdomain.com, upload2.yourdomain.com etc...

rnicholus commented 11 years ago

Yes, you are correct. The simultaneous request limit is per hostname. Allowing you to specify the endpoint per-chunk would add still more complexity though.

If your server bandwidth is a notable bottleneck, have you considered using a cloud service that scales for you, such as AWS? AWS offers a high-performance storage service (S3) that is built to scale in order to handle ridiculously high demand. In fact, we are just finishing up a feature that will allow you to upload files directly to S3 from the browser in all supported browsers (including IE7), bypassing your local servers (for the file bytes at least).

I have tagged your feature request as "to discuss". That means that we will discuss it internally at our next planning/story-pointing meeting. If we feel it is something we should schedule, it will be story-pointed and prioritized.

You may have already glanced at the issue tracker in this project. If not, I can tell you that there are a lot of features in the queue at this time. Features are determined to be high-priority if a good number of users request the feature, or if we at Widen feel that the feature would benefit a great number of users. Then there are the normal priority features: those that would only benefit a smaller portion of users.

While I can see how your request might benefit you, it's not clear that it would be a useful feature for many other users. In fact, it is likely to a feature that is rarely used, in my estimation. That classifies this as a low-priority feature. Low-priority features that are complicated to implement aren't likely to be implemented in the near future, or ever, quite frankly. This is due to the fact that Fine Uploader, like all projects, has a limited number of resources. It will take months and months for us to make it through even a portion of the current list of high priority issues.

The possibility of this making it into the schedule greatly increases if a good number of other users comment on this case. It looks like we have users from two separate organizations commenting here, unless I'm missing a connection. That's a good start, but I'd really need to be more convinced that this is a feature a good number of users could benefit from. Other low-priority feature requests have been bumped to high-priority based on user input.

I'm not opposed to continuing this discussion. Perhaps something can be worked out, but, again, we have a finite number of resources here on this project, and a finite amount of hours in a day, with many other higher priority features in the schedule at this time. I'd be willing to discuss in a chat room or in some other real-time environment if anyone thinks it will be helpful to the discussion.

micahnz commented 11 years ago

We use Rackspace CloudFiles.

What are the security implications with uploading directly to a storage service like S3?

Will it still support resuming? Also I'm not quit sure that feature would solve the small chunks problem as I mentioned above.

Will you support cloudfiles?

I will be interested to see what comes of this.

rnicholus commented 11 years ago

What are the security implications with uploading directly to a storage service like S3?

It is fairly easy to lock down your AWS S3 bucket to prevent a malicious user from compromising your files. Also, your local server is required to sign requests (using your secret key stored server-side) before Fine Uploader sends them off. I have a lengthy blog post written up that covers all of this (including how to lock down your buckets) for users who want to upload directly to S3 using Fine Uploader. The blog post is not publicly available yet, but will be when Fine Uploader 3.8 releases (hopefully some time next week).

Will it still support resuming?

Yes. In fact, all Fine Uploader features are supported when using the upload-to-s3 module.

I'm not quit sure that feature would solve the small chunks problem as I mentioned above.

"Small" chunks are generally not a good idea due to the overhead associated with sending such a request. The minimum chunk size (enforced by Amazon) is 5 MiB. Furthermore, S3 will theoretically (automatically) scale to such a degree that the throughput bottleneck will likely not be the endpoint (S3).

Will you support cloudfiles?

Seems possible, but it's not yet in our schedule. In fact, you are the first to request it. We have discussed it internally (at a very high level), but that's about it. It looks like cloudfiles allows HTML form uploads and recently added CORS support for their REST API. So, that's promising. I'd invite you to open up a separate feature request for this, if you'd like us to consider it. The tight integration with Amazon S3 that 3.8 will provide was a significant amount of work, and I suspect cloudfiles will be no walk in the park either. But, if is something rackspace users would like to have, we will certainly prioritize it accordingly.

rnicholus commented 11 years ago

@michaelmitchell Fine Uploader 3.8 has been released w/ native support for S3. You can play with a live demo on fineuploader.com, which also contains links to documentation which explains the support in more detail, if you are interested.

i0nC4nn0n commented 10 years ago

I'm sorry to address an issue that hasn't seen activity in the past 6 months, but this feature would be strongly appreciated. Regarding the point that you "wouldn't see, how uploading multiple parts simultaneously would decrease the total upload time" I could post some screenshots to prove the point, but instead might I suggest you to try to upload one single file of say 100Mb to Amazon S3 and than to upload 4 files of 25Mb each simultaneously (with multiple set to true) and you will see the difference in transfer speeds. I of course understand that you are busy, that this feature isn't highly requested, and that due to complexity issues it might never be done, but even if you won't implement it, might I ask for for some pointers? (although I'm not a js developer) I might be able to hack it in there somehow, and contribute something else back besides the 80 bucks.

rnicholus commented 10 years ago

The only obvious use case where sending simultaneous chunks would be beneficial involves a single large file.

When several files are being uploaded simultaneously, sending simultaneous chunks per file provides no additional benefit in most cases. This is due to the fact that all browsers restrict the number of simultaneous open requests per host. Fine Uploader enforces this restriction by preventing no more than 3 upload requests at a time (though this is configurable). Granted, we could be a little more flexible with this restriction by tracking open requests per hostname. If you are sending each file to a different endpoint, only then does the ability to upload simultaneous chunks make sense.

Adding this behavior to Fine Uploader may not be simple. The following high-level tasks must be completed:

Azure, S3, and traditional upload endpoint handlers must be refactored to allow for multiple simultaneous chunks to be uploaded. The current design only allows chunk 2 to be uploaded after chunk 1 has been successfully uploaded
Ensure progress events are invoked with the proper values when multiple chunks are uploading for the same file.
Figure out what to do if chunk 1, 2, and 3 are in progress, and a subset of these chunks fail.

I think the hardest part will be refactoring the handler code to allow for this behavior. This behavior should also be tied to an option, that is turned off by default. Also, as with any feature, the devil is in the details. I'm sure more complexities will pop up during development of this feature, as we will need to break an assumption that the library has relied on since the creation of the chunking feature.

jasonshah commented 10 years ago

We are attempting to use Fine Uploader to replacing our aging web-based uploader, and this is an issue for us. Many of our customers wish to upload multi-GB files. We wish to send those files straight to S3, ideally. However, without parallelized uploads, a single multi-GB file will require hours to upload.

@rnicholus , I totally understand the complexity required in getting there. I just want to add my vote to seeing this feature.

rnicholus commented 10 years ago

We are aware that this is becoming a popular feature request, and will prioritize it accordingly. It's likely that this will make it into a near future release. The current release cycle, 4.4, is full at this time, but perhaps 4.5.

We always give priority to heavily requested features, and this one is certainly a popular one.

rnicholus commented 10 years ago

I'm moving this up the list as I see it is a high-priority feature in light of the response from the community. I think the best approach may be to make an attempt to use all available connections (per the maxConnections option) to upload the set of files. If maxConnections is set to 3 (the default) and 1 file is submitted, and that file is chunkable, it will be uploaded 3 chunks at a time. If 2 files are submitted, As many as 3 chunks between the two files will be uploaded simultaneously.

I don't see an obvious need to create a new option to enable this. Preferably, Fine Uploader will send multiple chunks for a file at once whenever possible by default.

Retry and resume attempts may be very complicated to restructure in order to accommodate this workflow. I've increased the story points to reflect this. Fine Uploader doesn't track specific chunks in the upload handlers. Instead, there is a concept of the number of chunks that have yet to be uploaded or the index of the last chunk that was attempted. If a number of chunks for a file are attempted at once, this assumption breaks down, especially if only one of these chunks fails to upload. We will probably have to persist the index of the last consecutive chunk to successfully upload (for the resume feature). The retry code should probably only attempt to upload any chunks that haven't uploaded successfully yet though.

Progress tracking may be a bit tricky as well.

This is no longer an S3-specific feature. It should probably apply to all endpoint types.

rnicholus commented 10 years ago

Currently scheduled for 4.5.

rnicholus commented 10 years ago

After further investigation, this feature will probably have to be disabled by default and tied to an option. Otherwise, we run the risk of breaking any app with a server that expects chunks to be sent one at a time, in order. I suspect most traditional endpoint handlers will look for the last chunk request and then attempt to combine all chunks. With this feature, the last chunk may arrive at the same time or even slightly before previous chunks.

rnicholus commented 10 years ago

After some more thought and internal discussion:

Goals

Use all available HTTP connections to upload files: 1 connection per chunk. Max # of connections is specified via the existing maxConnections option.
Connection distribution algorithm is greedy: give out available connections on a first-come-first-server basis. For example, let's say we have two files are in the queue, 3 available connections, and the first file requires 6 chunks to be uploaded. All 3 connections will be given to the first file until there are less than 3 chunks remaining to be uploaded. So, all 3 connections will be used to upload the first 3 chunks, then we will continue to use all 3 connections on this file until we only have 2 chunks remaining. At this point, 2 connections will be given to upload the last 2 chunks and the 3rd connection will be given to the 2nd file for its first chunk. As the 1st file finishes its remaining chunks, the available connections will be distributed to this 2nd file to allow it to upload its chunks in parallel.

Non-goals

An option to turn off the greediness described in goal 2. In the scenario described there, the first file would only be given 2 connections, and the 2nd file would be given one, so both files are uploading at the same time. If 3 files were present, each file would only have access to 1 connection at a time.
Allowing the spread of chunks for a single file across multiple endpoints. To make full use of this, we would need to allow the maxConnections option to be applied separately to each unique request endpoint domain. Also, we would need to augment or adjust the API to allow integrators to specify multiple endpoints, with the understanding that Fine Uploader would evenly distribute chunks/files across all specified endpoints, maxing out allowed HTTP connections per unique domain. Further complexity comes from allowing these endpoints to be adjusted or specified per-file. I can see how this may be useful, but I'm not sure I want to shoehorn this into this feature, as we already have a lot to do here.

Implementation requirements:

This feature will seemingly require a significant amount of internal code refactoring. Currently, Fine Uploader relies internally on the assumption that each file will only consume 1 upload-related HTTP connection at once. This makes it relatively straightforward for handler.base.js to distribute available connections. 1 connection per file, and after a file completes entirely, give the free connection to the next file in the queue.

handler.base.js will need to be able to ask the specific endpoint handler to upload specific chunks, by chunk index, if the parallel chunk uploads feature is turned on. In order to do this properly, the base handler will need to be aware of the remaining chunks for each file, and when a specific chunk upload is complete. If the parallel chunks feature is disabled the base handler will probably still ask for specific chunks to be uploaded, by index, but ensure that only one connection is allocated per file. If chunking is not possible or disabled, the base handler will likely just ask the handler implementation to upload the specific file, not specifying the chunk index. The bottom line here is that we probably need to consider moving more control of the upload process out of the handler implementations and into the base handler.
Currently, we only track the index of the last chunk that completed. The status of each chunk will need to be tracked, since chunks may easily upload out of order. This is required to support the retry, resume, and pause features. This tracking should ideally happen in abstract.handler.xhr.js. Currently, chunk-related info is mostly tracked inside of each endpoint handler implementation. We will also need to track the XHR instance associated with each chunk separately.
The retry logic will need to change. Currently, a retry starts with the chunk after the last successful chunk, and continues until the file is finished. Now, we will need to be aware of the specific chunks that failed, which may not be sequential. If any chunk fails to upload for a specific file, the upload will be considered a failure. If the failed chunk does not succeed after all auto retry attempts, the file will be considered a failure. However, we will continue to upload chunks associated with other connections available to that file unless those fail as well. If all connections associated with that file are still failing after auto retries have been exhausted, all connections previously allocated to that file will be redistributed to other files. All of this means that the onAutoRetry callback may be invoked multiple times for the same file simultaneously if, for example, multiple chunks fail at once and are retried at the same time. Perhaps we will pass a 4th parameter to onAutoRetry and onManualRetry that includes the chunk index that we are retrying. The qq.status of a file will only change to qq.status.UPLOAD_FAILED after all auto-retries have failed, and the file is otherwise complete (even if all but one chunk has completed successfully). As you can see, dealing with failures here is potentially very tricky.
Currently, we only persist the index of the last chunk that uploaded successfully. The auto-resume feature will need to be aware of which specific chunks need to be uploaded for each file. This means that we will need to, at the very least, persist the chunk indexes that have yet to upload. This may also be an appropriate time to switch persistence of the resume data from cookies to localStorage for traditional endpoints, as described in #1024. The logic that persists this data and reads it back from storage should probably be moved out of the specific endpoint handler implementations and into abstract.handler.xhr.js.
When an upload is cancelled, all XHR requests for that file must be aborted. There may be multiple XHR requests associated with a file at one time if multiple chunks are being uploaded for that file in parallel.
The pause feature simply aborts an in-progress upload and then starts it up again (effectively "retrying" or "resuming" it) starting with the chunk after the last successful one. The changes to resume, retry, and cancel above should effectively cover most of the pause feature. There is a small amount of pause-related logic in the individual endpoint handlers. The logic for handling and tracking pauses should probably be moved entirely out of the individual endpoint handler implementations and into either the base handler or the abstract XHR handler.

Server-side implications

There are effectively no changes required server-side for S3 and Azure endpoints. Both APIs allow for parallel chunk uploading per object/blob. There is, however, more complexity associated with traditional endpoint handlers when this feature is enabled. Traditional endpoint handlers may be looking only at the chunk index to determine if the entire file has been transferred. With this feature enabled, that is not a safe assumption as parts can be uploaded out of order. Traditional endpoint handlers that want to make use of this feature will need to determine if all parts have arrived before attempting to combine them. One simple way to do this may be to send an ajax request via a Fine Uploader onComplete callback handler, which should be triggered after the file has been sent in its entirety. This is probably not an ideal solution, as errors associated with merging the parts will not be reflected in the status of the upload. It is generally advisable to hold off responding to the last upload chunk request until all parts have been successfully combined into the original file. This way, the server can easily alert Fine Uploader of a problem via the response (which can be reflected in the UI and the state associated with the file internally). A common problem that results in merge failure is a mismatch of the actual file size and the expected total file size.

rnicholus commented 10 years ago

@rodneytamblyn @alexmcmillan @michaelmitchell @i0nC4nn0n @jasonshah Would any of you be interested in beta testing this feature once it is more complete?

The following tasks must be addressed for the concurrent chunk workflow before this feature can be realistically beta tested:

[x] Finish adjustment of progress reporting
[x] Fix cancel & cancelAll
[x] Fix retry/failure handling
[x] Fix auto-resume
[x] Handle scaled images (i.e. scaled images are not eligible for concurrent chunk uploads)
[x] Fix pause feature

jasonshah commented 10 years ago

@rnicholus yes, we would be interested in beta testing.

rnicholus commented 10 years ago

@jasonshah Good to hear. You can follow the progress of this feature here, as all related commits will reference this issue number.

rnicholus commented 10 years ago

Note that my internal tests that involved uploading multiple chunks for a specific file concurrently showed significant improvements in bandwidth utilization. For example, on our internal network, sending a 110 MB file to S3 with chunk sizes of 5 MB took about 22 seconds when chunks were uploaded one-at-a-time. When maxing out the default maxConnections for that file (3 chunks at once) the same file uploaded in about 12 seconds.

rnicholus commented 10 years ago

Only a few things left before this can be beta tested:

[x] Implement a POST to traditional endpoints after all chunks have been uploaded successfully. This will be a required setting/behavior for concurrent chunked uploads, and optional for sequential one-at-a-time (default) chunked uploads.
[x] More unit tests
[x] Documentation updates (breaking changes, new feature page, new options)

In addition, it will be useful to type up a post on http://blog.fineuploader.com outlining the benefits of concurrent chunk uploads, but this is not required before beta testing.

rnicholus commented 10 years ago

Accidentally mis-tagged the final planned commits for this feature:

3cf084150793b24d4ea30bceeb1a0d3a4166311b
a95c180d5b3061b6efcbe8a2ca18abf8cd9da843
eb39a59

rnicholus commented 10 years ago

@jasonshah I think we're ready for beta testing. All feature work has been merged into the develop branch. I've updated the documentation as well. The concurrent chunking feature page is a good place to start. Also, there are breaking changes here (as this will be a 5.0 release) so you will want to read the upgrading to 5.x notes as well. Let me know when you would like to try this out and provide feedback.

jasonshah commented 10 years ago

@rnicholus, unfortunately we haven't rolled out FineUploader yet, and are probably a few months away from being able to slot this in. I will keep you posted.

On Tuesday, April 8, 2014, Ray Nicholus notifications@github.com<javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@jasonshah https://github.com/jasonshah I think we're ready for beta testing. All feature work has been merged into the develop branch. I've updated the documentation as well. The concurrent chunking feature pagehttp://docs.fineuploader.com/branch/develop/features/concurrent-chunking.htmlis a good place to start. Also, there are breaking changes here (as this will be a 5.0 release) so you will want to read the upgrading to 5.x noteshttp://docs.fineuploader.com/branch/develop/upgrading-to-5.htmlas well. Let me know when you would like to try this out and provide feedback.

Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/937#issuecomment-39895112 .

Jason Shah 312-933-1697

rnicholus commented 10 years ago

@jasonshah Thanks for the update.

If anyone else in this thread is interested in beta testing before the feature is officially released, let me know. Otherwise, this will be released as designed.

rodneytamblyn commented 10 years ago

Ray, we have implemented chunked upload support in OB3 with server-side tracking of chunks - as each chunk can potentially arrive on a different server with load balancing, using Fineuploader, so we could probably test this for you if it would help. Contact me if this would be useful for you guys. Great to see this feature being added to the product.

jasonshah commented 10 years ago

@rnicholus I was wrong, we are ready to beta test as well! Please send me details as well.

rnicholus commented 10 years ago

@jasonshah Please send me a message. I'm not sure I have your address.

thisiskell commented 10 years ago

Ray, the multipart upload seems to be working great! Here are my results for a 193MB file:

10MB chunks, not concurrent: 1:08s 10MB chunks, concurrent: 0:38s 10MB chunks, concurrent, 10 maxConnections: 0:24s

From your experience, what seems to be the sweet spot between chunkSize and maxConnections?

I did run into a javascript error the first time i tried to upload: "TypeError: dataTransfer.items is undefined"

qq.isFolderDropSupported = function(dataTransfer) {
    return (dataTransfer.items.length > 0 && dataTransfer.items[0].webkitGetAsEntry);
};

I changed the function to just return false, then everything started working fine. I am testing from pc FF.

rnicholus commented 10 years ago

I'll take a look at the TypeError shortly. That function should always return false in FF anyway, but true in Chrome and Opera 15+.

I haven't determined what optimal settings are for chunkSize & maxConnections. I can tell you that all browsers limit the number of open HTTP connections per host. For example, I believe that number in Chrome is 6 or so. Once you hit this limit, there is unlikely to be any benefit gained by increasing the maxConnections value further.

The minimum chunk size for S3 is 5MB, which is the default value when using Fine Uploader S3. I have only tested with the default chunkSize value myself.

rnicholus commented 10 years ago

The TypeError you are seeing appears to be a bug in Firefox, as far as I can tell. The items property is part of the DataTransfer interface, but it is coming up undefined in recent versions of FF. The nightly version of the HTML5 spec suggests that items should be defined. I'll look around and perhaps file a bug w/ FF. I guess we'll have to work around this though.

rnicholus commented 10 years ago

Let me backtrack a bit... After peering at the history of that line of code in Fine Uploader, we may have always been accounting for this undefined property, but 0b5b02f3db3a7f2d02ad005872a706d8a7461b23 introduced a regression that broke this check. Luckily, that change did not make it into the master branch yet. I'll re-open #1166 until we get this fixed.

rnicholus commented 10 years ago

@thisiskell I should have this TypeError fixed in 5.0.0-5. I'll send you an updated build.

thisiskell commented 10 years ago

Hey Ray, everything has been working great, except i discovered an issue in ie8 on upload cleanup: "Object doesn't support this property or method"

s3.jquery.fineuploader-5.0.0-5.js, line 4276

For the time being, i have wrapped that statement with a try catch and it seems to be working.

rnicholus commented 10 years ago

@thisiskell Can you elaborate on the "upload cleanup" step? I'd like to reproduce this locally so I can ensure it is fixed properly.

thisiskell commented 10 years ago

Sure, in upload.cleanup(), there is this block of code which is causing the error:

            if (handler._getFileState(id)) {
                handler._clearXhrs(id);
            }

I am testing using in ie8 VM, which can be downloaded here: http://www.modern.ie/en-us/virtualization-tools#downloads

rnicholus commented 10 years ago

Ah, upload.cleanup(), thanks. Yep, we are already using modern.IE vms for testing in IE7+. I'll make sure this is fixed before the 5.0 release.

What sort of upload speed gains did you see?

On Thu, Apr 24, 2014 at 5:14 PM, thisiskell notifications@github.comwrote:

Sure, in upload.cleanup(), there is this block of code which is causing the error:
        if (handler._getFileState(id)) {
            handler._clearXhrs(id);
        }
I am testing using in ie8 VM, which can be downloaded here: http://www.modern.ie/en-us/virtualization-tools#downloads

Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/937#issuecomment-41339291 .

thisiskell commented 10 years ago

Great, happy to help!

I don't have any hard numbers at the moment, but from my tests a couple of weeks ago, the multiple simultaneous chunks seemed to have improved our upload speed anywhere from 50-100% over the standard chunking. Very happy!

I have yet to go back and tune my settings to find optimal performance, but am currently configured to have 10 maxConnections with 20MB chunks.

thisiskell commented 10 years ago

Hey Ray, discovered one more issue: Safari Mac seems to be failing when uploading a file to s3 which is larger than 1 chunk. It works fine when the file is exactly 1 chunk.

Here is what amazon is returning after uploading the last chunk:

<Error>
  <Code>EntityTooSmall</Code>
  <Message>Your proposed upload is smaller than the minimum allowed size</Message>
  <ETag>12345</ETag>
  <MinSizeAllowed>5242880</MinSizeAllowed>
  <ProposedSize>0</ProposedSize>
  <RequestId>12345</RequestId>
  <HostId>12345</HostId>
  <PartNumber>5</PartNumber>
</Error>

And here is what s3.jquery.fineuploader-5.0.0-5.js is reporting:

[Error] [Fine Uploader 5.0.0-5] Complete Multipart Upload request for 0 failed with status 400. [Error] [Fine Uploader 5.0.0-5] Problem finalizing chunks for file ID 0 - Problem asking Amazon to combine the parts!

rnicholus commented 10 years ago

What version of safari?

On Friday, April 25, 2014, thisiskell notifications@github.com wrote:

Hey Ray, discovered one more issue: Safari Mac seems to be failing when uploading a file to s3 which is larger than 1 chunk. It works fine when the file is exactly 1 chunk.

Here is what amazon is returning after uploading the last chunk:

EntityTooSmall Your proposed upload is smaller than the minimum allowed size 12345 5242880 0 12345 12345 5

And here is what s3.jquery.fineuploader-5.0.0-5.js is reporting:

[Error] [Fine Uploader 5.0.0-5] Complete Multipart Upload request for 0 failed with status 400. [Error] [Fine Uploader 5.0.0-5] Problem finalizing chunks for file ID 0 - Problem asking Amazon to combine the parts!

Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/937#issuecomment-41418760 .

thisiskell commented 10 years ago

Strange, my xml error did not render properly in the comment... I will send it to you separately in an email.

We have reproduced now it on 3 machines running Mac Safari Version: 7.0.3, and our fineuploader is currently configured to have 20MB chunks with 10 maxConnections.

Let me know if you need any more details from me.

rnicholus commented 10 years ago

XML must be included in a fenced code block. I'll edit your post to fix it.

On Friday, April 25, 2014, thisiskell notifications@github.com wrote:

Strange, my xml error did not render properly in the comment... I will send it to you separately in an email.

We have reproduced now it on 3 machines running Mac Safari Version: 7.0.3, and our fineuploader is currently configured to have 20MB chunks with 10 maxConnections.

Let me know if you need any more details from me.

Reply to this email directly or view it on GitHubhttps://github.com/Widen/fine-uploader/issues/937#issuecomment-41420680 .

FineUploader / fine-uploader