Options for bulk uploading data in the browser

konklone commented 10 years ago

I think these all load large files initially fully into memory and then stream, but still:

https://github.com/DamonOehlman/filestream - see this comment for usage instructions
https://github.com/maxogden/filereader-stream - similar to above
https://github.com/mapbox/frameup - no docs yet, but works to upload files directly to the S3 API. keys are passed over the hash (#) so that params are not sent to the hosting server.

konklone commented 10 years ago

Actually, after talking with @maxogden, looks like this actually is possible, at least for his filereader-stream library. Details at https://github.com/DamonOehlman/filestream/issues/9#issuecomment-58467791.

Add in websockets for the server transport, and I think you could pull off a 1TB upload in the browser, all the way into S3. However, websockets would require a server proxy in the middle, since S3 doesn't have a websockets API, and that means the server proxy would absorb all the bandwidth costs (so we'd sort of pay Amazon double for bandwidth).

I'll want to test out a) traditional XHR streaming (with POSTs for every chunk) as the method of streaming file reading, just to compare against the Blob.slice approach, and b) normal XHRs for the server transport that maybe use S3's Multipart Upload API.

max-mapper commented 10 years ago

one note is that you get free transfer within amazon infrastructure

also hopefully we can make dat something simple enough for you to run as your server and just configure it to do the browser <-> s3 stuff without having to do much custom dev stuff, just plugging in a couple plugins

konklone commented 10 years ago

Actually, all data transfer into S3 is free regardless of where it comes from. So if we could wire it up with no proxy, we could operate the upload service with 0 bandwidth cost (just PUT and storage cost).

Though actually, it looks like data transfer into EC2 from the Internet is free as well! So okay: whatever.

Amazon has done a wonderful job at cognitively manipulating people into establishing long-term dependencies on them by making all inputs zero-cost!

konklone commented 10 years ago

Work will proceed at https://github.com/konklone/upload-anything.

konklone commented 10 years ago

I didn't hang out with anybody for 2 days and I made this: http://bit.voyage/#bucket=[your-bucket]&key=[your-key]&secret_key=[your-secret-key]

Replace the params with your S3 bucket and credentials, turn CORS on for your S3 bucket, and you can drag and drop up to ~200GB files into your browser and send them on a voyage into the cloud.

Some ideas on next steps here: https://github.com/konklone/bit-voyage/pull/7 Short of it is, 1TB+ files into S3 are not going to be possible without a server-side proxy -- though there are more interesting destinations than just S3 that are possible here.

max-mapper commented 10 years ago

use the dat blob store api:

https://github.com/maxogden/abstract-blob-store https://github.com/datproject/discussions/issues/5

On Tuesday, October 14, 2014, Eric Mill notifications@github.com wrote:

I didn't hang out with anybody for 2 days and I made this: http://bit.voyage/#bucket=[your-bucket]&key=[your-key]&secret_key=[your-secret-key]

Replace the params with your S3 bucket and credentials, turn CORS on for your S3 bucket, and you can drag and drop up to ~200GB files into your browser and send them on a voyage into the cloud.

Some ideas on next steps here: konklone/bit-voyage#7 https://github.com/konklone/bit-voyage/pull/7 Short of it is, 1TB+ files into S3 are not going to be possible without a server-side proxy -- though there are more interesting destinations than just S3 that are possible here.

— Reply to this email directly or view it on GitHub https://github.com/18F/bulk-storage/issues/3#issuecomment-58979942.

konklone commented 10 years ago

@maxogden This is completely fascinating.

Is it the abstract blob store's role to handle resumption, as well? Or is that outside the spec's intended semantics? For example, with a 200GB file, I know I'm likely to have to allow users to perform the upload over multiple sessions. With S3, I can store the UploadId, the byte offset, and the part number and size, and I can make that happen. Is that something you intend to address or generalize?

Also, awesome that the s3-blob-store uses s3-upload-stream under the hood -- it's what I'm using right now, and it's a great library (that I've been chipping in some browser fixes for).

max-mapper commented 10 years ago

@konklone me and @mafintosh have talked about adding resumability to the API but haven't had time to work on that part yet. if you wanna help flesh that part of our API out we would be appreciative

konklone commented 10 years ago

OK, I opened up an issue at https://github.com/maxogden/abstract-blob-store/issues/8.

18F / bulk-storage

Options for bulk uploading data in the browser #3