Closed filerun closed 9 years ago
Imagine uploading a 2GB file, in 2MB sized chunks. That will mean 1000 GET requests to test for the current status of the file transfer. Using the current implemenetation, it would mean also that it will check hundreds of time the total size of the chunks to find out if the upload was completed. It would also mean that the server-side script would have to join in 1000 pieces in a single request. The original developer of the library really didn't test this with large files.
Hi,
nice to hear that you are interested in flow.js development. Currently library is far, far away from a perfect solution and this feature can make it a bit better.
Although this would be a breaking change, so i am creating develop branch and i hope to release flow.js v3 version. If you are interested to make this feature, please make a pull request to develop branch.
How to implement this? We could use some ideas from tus
project: http://tus.io/protocols/resumable-upload.html#5
I like the idea of HEAD request with offset
header.
Tomorrow I hope to write some more thoughts about this and maybe we could make some milestones for next release(maybe do some refactoring i code and tests).
Great to hear from your and that the project is active. The problem is, the project I'm working on doesn't give me much time nor does it allow less then perfect solutions. So my task is to take Flow and make it perfect :) I don't have much experience with Git or Github, so I'll just send you my version of it as soon as I am ready. I already started the work on it. So far I have moved the "test" method from the "FlowChunk" object to the "FlowFile" because that is where it belongs. The purpose of the function is to check at the server if the upload of the file (not of a chunk) is possible and if any data was uploaded before, to resume (the file's entire upload) from there. I have completely dropped the feature that allows simultaneous uploads. It's a nice idea that will unfortunately never see a practical implementation. I'll argument this as soon as I have some free time.
For the testing function, the link you posted describes exactly the functionality I have in mind. I will stick to GET though, as I've seen server reject HEAD requests, or treat them themselves instead of passing the request to the CGI module. GET is safe. The request should either allow the server-side script to generate an error that can be presented to the user or if there is no problem and there is data already uploaded, to provide the data offset. I would recommend having the offset in the response body and not in a header, it will make the server-side solution more portable. I'll add a customizable function that reads the offset from the response.
Another thing that I would like is for the server-side to be able to process various chunk sizes. If a file has been split in 4 an two parts were uploaded, another upload process that would use a smaller chunk size (and would split the file in 8 parts) would not be able to restore the file properly. So, Flow should first get the facts (how much has to upload) and then establish what are the chunks. So I'm rewriting now FlowFile.bootstrap
Hello everyone, flowjs is really nice library to work with. I am also interested to contribute in this project. But not sure where to start.
Well, git is a must and i can't imagine library development without it. It is not as hard to learn it as you think, take a look at these links: http://try.github.io/levels/1/challenges/1, https://help.github.com/articles/set-up-git.
For the testing function, the link you posted describes exactly the functionality I have in mind. I will stick to GET though, as I've seen server reject HEAD requests, or treat them themselves instead of passing the request to the CGI module. GET is safe.
Could you give me some references about this issue?
I have completely dropped the feature that allows simultaneous uploads. It's a nice idea that will unfortunately never see a practical implementation. I'll argument this as soon as I have some free time.
I agree, simultaneous uploads gives us no profit if we have only one upload server, more about this: http://kufli.blogspot.dk/2012/06/speedier-upload-using-nodejs-and.html I would like to hear your arguments too.
The request should either allow the server-side script to generate an error that can be presented to the user or if there is no problem and there is data already uploaded, to provide the data offset.
Maybe we need here a custom response handling function? Function would determine if request was successful or not. By default it would check if response status is 200 and chunk retries is less than max allowed chunk retries.
Another thing that I would like is for the server-side to be able to process various chunk sizes. If a file has been split in 4 an two parts were uploaded, another upload process that would use a smaller chunk size (and would split the file in 8 parts) would not be able to restore the file properly. So, Flow should first get the facts (how much has to upload) and then establish what are the chunks. So I'm rewriting now FlowFile.bootstrap
This would be great! Another issue in bootstrap function is: https://github.com/flowjs/flow.js/blob/master/src/flow.js#L872 It slices file in lots of chunks at bootstrap and it takes time, I prefer lazy chunk slicing with a buffer. For now, if file is 2Gb of size and we have chunk size of 4Mb, library generates 512 chunks at once. With lazy slicing, we could create one chunk for upload and prepare next chunk for later.
Furthermore, we should write flow.js protocol example, just a simple one, something like this (Core Protocol): http://tus.io/protocols/resumable-upload.html#5 Then we would see what needs to be done. Maybe this must be done even before we start changing the code. In this way we will stay clear and we will know what the result will be.
@codeperl I will write some milestones in a few days, take a look at them. If you think you can manage that, just say so.
We could also benefit some ideas from dropbox upload api: https://www.dropbox.com/developers/core/docs#chunked-upload
Typical usage:
Send a PUT request to /chunked_upload with the first chunk of the file without setting upload_id, and receive an upload_id in return. Repeatedly PUT subsequent chunks using the upload_id to identify the upload in progress and an offset representing the number of bytes transferred so far. After each chunk has been uploaded, the server returns a new offset representing the total amount transferred. After the last chunk, POST to /commit_chunked_upload to complete the upload.
Maybe we should implement commit step? This way, in server side, after each chunk upload, we won't need to check if file upload is finished. Can you see any drawbacks in dropbox upload api?
Could you give me some references about this issue?
It is not an educated opinion, but just my experience with implementing upload systems over a large number of servers. I've had troubles with many IIS servers that didn't have HEAD in the list of "allowed verbs". I went with POST in the end, as I leave the full length of the URL to the user implementing the library. Stuffing the URL with variables might leave the user limited (if he has to pass an absolute path in the URL).
I agree, simultaneous uploads gives us no profit if we have only one upload server, more about this: I would like to hear your arguments too.
Uploading one file at the same time to multiple servers would definitely give a nice speed boost, but it is currently out of the question for most people. Uploading a file by chunks in a random order creates a big problem on the server side, which outweighs the marginal speed gain. It will require the server to keep the chunks, with their numbers in a temporary place, until it has all the chunks to put together. This is almost unusable with large files, which is the main point of doing this in the first place. An 8GB file, uploaded in 20MB chunks, would mean 400 chunks. Imagine how long will the user need to wait after the last chunk, for the server-side script to put 400 chunks together into a 8GB file. Assuming that the process will be successful and the script will not be stomped down for running for too longer or using too many server resources. If the chunks are uploaded in a proper order, you only keep one temporary file, to which you append each chunk on each request, until it is done.
Maybe we need here a custom response handling function? Function would determine if request was successful or not. By default it would check if response status is 200 and chunk retries is less than max allowed chunk retries.
The reason I don't want to rely only on HTTP status responses is the simple fact that if PHP fails, the status is 200 OK, even if the body contains an ugly error. There are actually quite many cases where the status is 200 and the request got trashed. The user should be allowed to adapt this to the particularities of his case.
It slices file in lots of chunks at bootstrap and it takes time, I prefer lazy chunk slicing with a buffer.
I have separated the calculation of the chunks from the bootstrap method for the reason that needs to be done after getting the stats from the server. However I don't think the chunks are created there, but only their number and offsets are. The actual data is read only when the current chunk gets uploaded.
Can you see any drawbacks in dropbox upload api?
There is this dilemma, regarding chunk size, that needs to be take into consideration. For compatibility with limited servers (post_max_size, upload_max_filesize, web server's limited request body size, etc.) you want them to be as small as possible. For fault tolerance again you want the chunks to be as small as possible, so that you don't loose much time if a chunk fails. So far so good. The thing is, the bigger the chunk the smaller the number of chunks. The smaller the number of chunks, less time gets spent waiting for the server processing a reply. In most cases the server processing takes time, connect to databases, etc. and even a second in between each chunks, for a 60 chunks file, that one minute of extra waiting time. So we can say that the bigger the chunk the shorter the upload time, which translates to faster upload speeds - as close as possible to non-chunked upload. So, for efficiency you want the chunks as big as possible. So you have "speed", "fault tolerance" and "portability". For the project I am improving this upload tool for (http://www.filerun.com) the priority goes like this: "portability" (it needs to work) over "fault tolerance" (it needs to work reliably) over "speed" (at a decent speed). You can of course make it very portable and fault tolerant with a chunk size of 1MB, but that will make it painfully slow for files larger than 50MB. However, nowdays people upload files larger than 6GB on daily basis. I have a client working with PSD files, 2GB each. So, we should aim for the largest chunk size possible which will not affect the fault tolerance too much and keeps the server happy. Now, I would say using a chunk size of 100MB over a decently reliable network connection would be a good case. Imagine having to upload a 100MB chunk just to get the server to tell you something like "A file with the same name already exists". That would be frustrating. So the server-side checks need to be done before proceeding with the actual work. That is why I think the Dropbox protocol sucks :)
P.S. I spent a few hours last night on the library and I already have it working pretty much like I wanted it. I will test it a bit more, tweak its code a bit more as I don't like how much CPU is the progress calculation using, and then share it with you. I must say, it was very easy working with the code, nicely organized.
Another change that I'm making, pausing the queue should be separate from pausing individual files. I can pause one file so the file is skipped when it's its turn. While uploading the rest of the files I want to pause the whole queue, the transfer stops. But when I resume the queue, the transfer continues, while the specifically paused file remains paused.
I hope it's ok if I'm using this discussion thread to document my changes so that you can understand what I've done once I share my code. There was a logical mess with the order of reporting progress. Instead of FlowChunk reporting progress to FlowFile and FlowFile reporting progress to Flow, it was the other way around, Flow calculating progress for the FlowFile and FlowFile calculating progress for the FlowChunks. I've fixed that and the associated performance issue.
Another thing "progressCallbacksInterval" seems like a good idea, but it fixes just half of the problem. Users don't care about how often the progress should be reported, they care when actual progress made. The user wants to know about the percentage when there is at least 1% (or even 0.X, but not 0.XXXXXXXXXX) of the file uploaded, so the progress event should be executed only when that's the case. In most cases the callback updates the user interface, and DOM manipulation is expensive, you don't want to resize elements quarters of a pixel, but at least one pixel. I'm adding a solution to that as well ;)
Another change that I'm making, pausing the queue should be separate from pausing individual files. I can pause one file so the file is skipped when it's its turn. While uploading the rest of the files I want to pause the whole queue, the transfer stops. But when I resume the queue, the transfer continues, while the specifically paused file remains paused.
Well spotted, this must be done. I will fill a milestone for this.
About 200 status on php fatal error, we can avoid this: http://stackoverflow.com/questions/2331582/catch-php-fatal-error
I went with POST in the end, as I leave the full length of the URL to the user implementing the library. Stuffing the URL with variables might leave the user limited (if he has to pass an absolute path in the URL).
Well yes, library would gain more flexibility.
Another thing "progressCallbacksInterval" seems like a good idea, but it is just a poor solution to the problem. Users don't care about how often the progress should be reported, they care when actual progress made. The user wants to know about the percentage when there is at least 1% (or even 0.X, but not 0.XXXXXXXXXX) of the file uploaded, so the progress event should be executed only when that's the case. In most cases the callback updates the user interface, and DOM manipulation is expensive, you don't want to resize elements quarters of a pixel, but at least one pixel. I'm adding a solution to that as well ;)
This parameter is also used for upload speed and time remaining calculation, and these specs depends on time, not on total upload percentage. Another drawback of this approach is that callbacks would be rare for large files and dense for small ones. If we track 0.x% part of progress, this would mean that there will be 1000 callbacks for any file, large or small.
First thoughts how flow.js protocol should look like:
POST request is used to create new or resume old resource.
First request SHOULD NOT carry any file data, only information about it.
Content type MUST be multipart/form-data
.
Request MUST include form data about: file size, file name and identifier. Identifier later will be used to resume file upload.
Request MAY include additional data.
Second and later requests MAY contain file blob and MUST set offset
information.
Servers MUST acknowledge successful POST operation using a 200 Ok status. Response MUST contain Offset header. The value MUST be an integer that is 0 or larger. Response MAY contain additional data, and client may perform additional validation.
Example:
File is image and is named as img.png. It has size of 100bytes. Be default identifier is created by concatenating file size and file name. Making a first request:
POST /files HTTP/1.1
Content-Type:multipart/form-data
Content-Disposition: form-data; name="fileSize"
100
Content-Disposition: form-data; name="fileName"
img.png
Content-Disposition: form-data; name="identifier"
100-img.png
Response:
HTTP/1.1 200 Ok
Offset: 0
Starting upload Request is the same, but it includes file blob and offset. Blob is size of 30 bytes.
Request:
POST /files HTTP/1.1
Content-Type:multipart/form-data
Content-Disposition: form-data; name="fileSize"
100
Content-Disposition: form-data; name="fileName"
img.png
Content-Disposition: form-data; name="identifier"
100-img.png
Content-Disposition: form-data; name="offset"
0
Content-Disposition: form-data; name="file"; filename="blob"
Content-Type: application/octet-stream
Response:
HTTP/1.1 200 Ok
Offset: 30
However this example does not include the final step. How client should tell that this is a final chunk? Maybe request could contain parameter, which indicates if this is a final request? O maybe we should use simple math? Offset + current blob size == fileSize?
Other question, should request include currentChunkSize? In php we could use $_FILES["blob"]["size"] instead.
Also I haven't mentioned error handling, can we leave same logic? Retry request if response status is not in permanent errors list, otherwise stop upload and send error for the client.
If we track 0.x% part of progress, this would mean that there will be 1000 callbacks for any file, large or small.
Using a combination of both time and actual progress, this should work good for both small and large files. For both large files uploaded in large chunks and large files uploaded in small chunks.
First thoughts how flow.js protocol should look like:
Looking great.
However this example does not include the final step. How client should tell that this is a final chunk? Maybe request could contain parameter, which indicates if this is a final request?
Yes. My version now submits the two following boolean variables: flowIsFirstChunk and flowIsLastChunk If you wait a bit, just a day or so, I will send you my full working code and you can see my changes. I've put quite a bit of work in it.
Other question, should request include currentChunkSize? In php we could use $_FILES["blob"]["size"] instead.
The server should not care about the size of the chunk. The server gets as much data as it gets there over the network from the client.
About 200 status on php fatal error, we can avoid this: http://stackoverflow.com/questions/2331582/catch-php-fatal-error
For my case, I'm not trusting the 200 status, at all. Almost no hosting service has servers configured to throw HTTP 500 on PHP errors. And the PHP errors can be very diverse, from parsing errors and warnings to silently failing. I am also not including the server's offset in a HTTP header, which might get lost through proxy and reverse-proxy servers. I'm including it in the response body. You can of course choose to do differently. I'm just sharing my opinion and choice.
How about making chunk size dynamic and recalculate it after each uploaded chunk? For example, if we have 4mb starting chunk and it gets uploaded in 4 seconds. This means that client has at least 1mb/s connection speed and we can adjust chunk size to more appropriate. How about making chunks size as big as client can transfer in 1 minute? Chunks size would be calculated by multiplying average download speed with time interval. In this example it would be 1mb/s * 60 seconds = 60 mb chunk. For this feature library would need four more parameters: maxChunkSize, minChunksSize, defaultChunkSize, chunkSizeAdjustInterval.
I am also not including the server's offset in a HTTP header, which might get lost through proxy and reverse-proxy servers. I'm including it in the response body.
Managing headers in library adds a bit of complexity, forcing server always to return json would be a simpler solution.
How about making chunk size dynamic and recalculate it after each uploaded chunk?
I love the idea. The web upload process can definitely made much "smarter" than it currently is. I was also thinking that we should keep count of the chunks that had to be retried and if it reaches a predefined threshold, automatically lower the chunks size, hoping for less failures. In the same style like Youtube lowers the video quality when it detects poor bandwidth.
I finally managed to get the overall progress tracked properly, including the removal of the temporary data that has been uploaded before the XHR was forcefully aborted. That data was added to the progress, but not subtracted if the chunk was not successful. So if you now pause a 100 MB chunk right in the middle, the file's progress and the overall progress goes back by 50 MB, which should happen, as it is reflecting the reality. However, I'm writing this note as a TODO item, as this can be improved. The user has no idea that there is a chunking going on in the background, so if he is pausing the upload and the chunk size is large, say 100MB, it would be a pity to completely discard 90M of uploaded chunk to pause in that instant, instead of waiting a moment for the chunk to complete. So we can add a timer and threshold for that.
P.S. Keeping track of progress while taking failures into account turned to be more challenging that I thought :P However, it will be flawless. And it will allow multiple files (not chunks) to be uploaded at the same time while keeping the entire progress effortlessly, without killing the CPU.
What do you think about multiplying the "flowObj.opts.chunkRetryInterval" with "FlowChunk.retries" every time a chunk fails, with the purpose of increasing the wait time between attempts? Or at least doubling the retry interval with every failure. This should avoid having clients continuously flooding the server in case of troubles.
Ok, so here's my code: http://bit.ly/1dK1pnk I did quite some testing, but I'm sure it can use more, especially fault-tolerance testing. I made so many changes that it would take me quite a while to remember them and list them. They are mostly about what I have wrote in this discussion. The biggest changes are these two:
codeperl -> you said you wanted to contribute, you can do that by downloading and testing out this version
What do you think about multiplying the "flowObj.opts.chunkRetryInterval" with "FlowChunk.retries" every time a chunk fails, with the purpose of increasing the wait time between attempts? Or at least doubling the retry interval with every failure. This should avoid having clients continuously flooding the server in case of troubles.
I have opened issue for this two days ago https://github.com/flowjs/flow.js/issues/6
Thanks for sharing your work, I will review it tomorrow.
I have opened issue for this two days ago #6
Ok, great, so I see that there are more people thinking it's a good idea. Cool! It should be implemented then, as it takes just a line of code.
@vvllaadd, Just got the zip. Hope I can start very soon.
A quick overview of the new vars being passed to the server:
flowGetOffset - boolean, marking the request that checks for already uploaded data. The server should return the number of uploaded bytes in the response body, or 404 if no data has been uploaded so far flowIsFirstChunk - boolean, marking the first uploaded chunk of the file flowIsLastChunk - boolean, marking the last uploaded chunk of the file flowCurrentChunkSize - no change flowTotalSize - no change flowFilename - no change flowRelativePath - no change
A fix for progress monitoring for small files, files that are uploaded in a single small request:
/**
* The size of this chunk
* @type {number}
*/
this.size = this.endByte - this.startByte;
this.progressHandler = function(event) {
if (event.lengthComputable) {
$.loaded = Math.min(event.loaded, $.size);
$.total = Math.min(event.total, $.size);
if ($.loaded > $._lastLoaded) {
var loadedNow = $.loaded-$._lastLoaded;
$.fileObj.completedBytes += loadedNow;
$.flowObj.completedBytes += loadedNow;
}
$._lastLoaded = event.loaded;
}
$.fileObj.chunkEvent('progress');
};
(event.loaded can be larger than the file because it includes the HTTP headers)
Another improvement that I will make is to skip the "FlowFile.getOffset" call for tiny files (files that would take less than just a couple of seconds to transfer), as it only adds a delay when uploading a few hundred small text files.
You can download an updated version using the same link: http://bit.ly/1dK1pnk
Some of the changes:
Here's the constructor I use with it:
new Flow({
target: 'upload.php',
chunkSize: 2097152, progressCallbacksInterval: 100,
maxChunkRetries: 3, resumeLargerThan: 10485760,
validateChunkResponse: function(status, message) {
if (status == '200') {
try {
var rs = Ext.util.JSON.decode(message);
} catch (er){return 'booboo';}
if (rs && rs.success) {
return 'success';
}
}
return 'booboo';
}, validateChunkResponseScope: this,
query: {path: 'some/path'}
});
This version should be quite stable. I have tested it quite a lot, the fault tolerance works fine, getting close to my goal.
Next on my todo list:
Simultaneous file uploading to one backend will do no good.
To handle this feature target
option should be renamed to targets
.
Eg.:
targets: ['http://srv1.example.com', 'http://srv2.example.com']
And simultaneous file upload count would be equal to targets.length
.
Obviously, this feature for client would be useful only then it has a greater upload speed than one server can handle. So we will have to do some math here and calculate average upload speed of one file and all files.
If average upload speed does not increase with increased simultaneous uploads count, then the library should decrease simultaneous uploads number.
If simultaneous upload number is less than targets.length
, then targets should be picked in round-robin style.
removing the tons of JS closures
I am thinking of writing flow.js v3 in closure style there possible. Library will be much smaller then minified.
Simultaneous file uploading to one backend will do no good.
Not entirely true. What is no good for, is uploading two large chunks at the same time to the same server. In that case, the bandwidth gets split in two, and it will take just as long time as it would take to upload the two chunks one after the other at double the speed. There is however a case where simultaneous upload can be of really big help. That is, uploading many small files. What most people calculating upload speeds are not considering is the server-side processing time. In many cases, the server takes just as long to process a 200MB chunk as it takes to process a 200KB one, that is because the server-side program still needs to connect to databases, check authentication, etc. But when uploading two large files in few large chunks, the "wait time" is not important. When you have to upload 2000 small files of average size of 200KB (like a larger dynamic website for example), the accumulated wait time can be enormous. In case of uploading a small file, the wait time can be much longer than the time the browser took to complete the request. So then why wait for the server to reply, when you can complete another request in the meantime? Let the browser wait for multiple server replies at the same time.
I promised a client that I will make a browser upload function that will be at least as efficient as an FTP program. At this time, there is no such thing online. There is no HTML upload method that is as reliable as FTP for any amount of data. It's not only about uploading large files, but also many small ones. Would you consider uploading a Wordpress or any other similar CMS-based website, using Flow, without zipping the site first? Most probably not. What if you don't have a good option for unzipping on the remote server? Back to FTP...
Good point! How about making simultaneous file uploads number dynamic? We have a chunk size and size of every file. I suggest calculating like this: we could transfer as much files as they fit in a single chunk size. E.g.: We have chunk size of 4mb so we could transfer one file of 2mb and two files of 1mb at the same time. However simultaneous uploads should have a hard limit, so that client wouldn't upload 1000 files simultaneously and browser does not support this. So the final formula would be: transfer as much files as they fit in a single chunk size, but not more than client defined hard limit.
By default hard limit should be based on a browser limits: http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/
Also, this would play nice with dynamic chunk size calculation.
I made some quick tests, comparing total upload time of uploading small files one by one or simultaneously. I have only tested Chrome v32. With 2 simultaneous uploads the upload time lowers by 50%, which is awesome. For some reason, Chrome doesn't really use more than 2 connections to the same host, so increasing that seem to make no difference. I'll return soon with an update on my library, which includes the "maxSimultaneousTransfers" configuration along with some other fixes and optimizations.
I'm quite happy with the progress so far, uploading one thousand small files or one gigantic (20GB) file for me it's a piece of cake now :D
I was thinking about simultaneous uploads issue and it seems we are trying to solve this in a wrong way. Instead of trying to upload one file in a single request and other in concurrent request, we should upload all of them in one request, but request should not exceed chunk size. This would be a batch upload. For example: client needs to upload wordpress site, it is 16mb of size and has 1162 files. Library chunk size is 4mb. So all that we need is 4 request to upload it instead of 1162 requests. In my opinion this should be huge improvement for client and for server.
Although this might bring some complexity, but it worth a shot.
Files transferring in xhr might be made in array:
form.append('file[' + file.uniqueIdentifier + ']', blob, file.name);
form.append('file[' + file.uniqueIdentifier + '][name]', file.name);
form.append('file[' + file.uniqueIdentifier + '][size]', file.size);
form.append('file[' + file.uniqueIdentifier + '][offset]', file.offset);
Once this is done, batch upload could be used for large files as well.
If clients has to upload 100 files, size of 5mb, in 4mb chunks, then instead of 200 requests, library could make it in 5mb * 100 / 4 = 125 requests. This means that, if file last chunk is 1mb of size, so library should append a second file or part of it to this request.
You are so right. That would be a huge improvement. The simultaneous uploads is still very useful and after my tests I am now convinced that it's useful to upload more then one file at the time. As soon as I get that working properly (there are still issues with the queue management and the progress reporting) I will share my code and work on the idea of bundling multiple small files in fewer requests (which can also be simultaneous). Had limited time to work on this the past days, but I'm getting there.
Could you make more tests with simultaneous uploads? Chrome browser should support more than two connections to a single host for sure or try firefox. If we can't see any improvement with 3 and more connections, then there is something fishy. Maybe this is just an issue to open a connection, it takes time. So maybe we should open next connection once we finished uploading previous chunk? Condition to open next connection would be:
xhr.upload.onprogress=function(event){
if (event.loaded==event.total) {
// upload finished, waiting for server response
// while server is processing, we could send next chunk
}
}
Although, this is still a simultaneous upload, but for most of the time one connection is opened.
My version now submits the two following boolean variables: flowIsFirstChunk and flowIsLastChunk
Be careful with these, because with simultaneous uploads first chunk might get uploaded after last chunk.
I would still recommend dropping simultaneous uploads feature, because with batch uploads, profit of it would be negligible.
Be careful with these, because with simultaneous uploads first chunk might get uploaded after last chunk.
The only way I am using simultaneous uploads is with separate files, not with the chunks. Chunks are always uploaded in their proper order. Letting chunks upload in a random order is a very very bad idea.
I would still recommend dropping simultaneous uploads feature, because with batch uploads, profit of it would be negligible.
I still think it's a useful feature. Imagine a queue with one large file, like 2GB, followed by 50 small files of 1MB or less. With one transfer at a time, you will have to wait quite a lot for the large one to complete, unless you pause it to allow the small ones to transfer. With two simultaneous uploads, the small files will get uploaded before the large one completes, without extending the transfer time of the large file by much. The batch uploading will take quite some work. Once that is implemented, more tests and comparisons can be made.
The only way I am using simultaneous uploads is with separate files, not with the chunks. Chunks are always uploaded in their proper order. Letting chunks upload in a random order is a very very bad idea.
Oh, this might bring some complexity.
I still think it's a useful feature. Imagine a queue with one large file, like 2GB, followed by 50 small files of 1MB or less. With one transfer at a time, you will have to wait quite a lot for the large one to complete, unless you pause it to allow the small ones to transfer. With two simultaneous uploads, the small files will get uploaded before the large one completes, without extending the transfer time of the large file by much.
Well, if we have two large files in the beginning of the queue, then this does not help.
To keep library simple, it should use same logic for all files. I think batch uploading is the way to go and it solves all those issues above. Although, I haven't tried to implement this and maybe there are some unknown drawbacks.
Oh, this might bring some complexity.
Already implemented it.
Well, if we have two large files in the beginning of the queue, then this does not help.
It doesn't do any harm either. The user might pause one of the files, if he wants too. Most people upload very large files individually and only small files in a bunch.
I will be posting today what will probably be my last contribution to the public. My code is getting far from the original code and this work is not on my free time but backed by business.
So, here's my latest version: http://bit.ly/1dK1pnk
The changes:
If this version will become the default Flow code, I am considering contributing with pull requests for the rest of the improvements that can be made (like "dynamic chunk size" and "bundle small files in fewer requests").
Enjoy!
P.S. Shameless plug: don't forget to check out FileRun (http://www.filerun.com), the project were this code will go to.
This version will not become the next version of flow.js, because:
However, I am happy that you are developing this version, because you have came up with some nice ideas.
Yeah, the bureaucratic things, I leave that to you :) Tests are nice, but I don't have time to write them, so I can do my best to manually test most common cases and wait for the actual users to report other possible issues :P Anyway, I hope my code and ideas will be of some use to you or anybody else, as FlowJS was of use to me.
I uploaded a file using the Node.js example, but there seems to be some residual files leftover in ./tmp
. I'm not sure which one is the actual file and what I should delete.
Yo - any movement on this? I'm not sure if these changes were ever merged in. I'm more than willing to do the 'bureaucratic things' if that's all that's needed to get these great additions into the library. Also all of the bit.ly links are dead and I'd like to review the code that was submitted.
I have started refactoring this lib at develop branch https://github.com/flowjs/flow.js/tree/develop, it has the basic structure with well covered tests. Currently there is no html example/demo or any js documentation so far. It would be great to find a developer who would like to contribute.
Quick question -- are any of the changes that vveladd implemented present in the development branch? Much of the vars and functions he mentioned as added seem to be absent.
As soon as I will get a bit of free time I will start a new public project based on my code. It has been successfully tested during one year with thousands of users and millions of uploads. If I don't post anything here in a week or two, write a message to remind me about it ;)
Awesome - I'll put it on my calendar.
Any updates on the new project @vvllaadd? :)
@Patrik-Lundqvist , @marcstreeter anybody else interested, you can watch this project: https://github.com/vvllaadd/filerun , i still need to find some free time to start it up, but it will certainly happen durring February
The project at the link above has been started and initial code is available for download, with an included example.
Hi, I have an example with java, it works, but it still has some errors, https://github.com/jdc18/ng-flow-with-java/tree/master/ng-flow-java, i update the java - resumable example, if anybody wants to test it, it works with cors too. This is my first project so any improvement on the code is welcomed.
How does the joining of chunks work in the library, or is something that needs to be handled on the server side
Hi,
Well done with the library! Although it currently doesn't seem to be much difference between this and resumablejs, I prefer flow's code structure. Unfortunately it inherits the same big performance issue. The "testChunks" idea is nice, but it is useless in its current form. If let's say one have uploaded 5/10 chunks in a previous session and wishes to resume the transfer, wouldn't be more practical for Flow to send only ONE single GET requests to retrieve the size of what has been already uploaded to the server and continue from there, instead of testing all chunks from 1 to 10? There are a couple of more big problems, but let's start with this. I am willing to collaborate.
Best regards,
Vlad