cloudydeno / deno-aws_api

From-scratch Typescript client for accessing AWS APIs
https://deno.land/x/aws_api
59 stars 3 forks source link

WIP: Streaming request bodies (for S3) #24

Closed danopia closed 1 year ago

danopia commented 2 years ago

Turns out this is pretty tough API-wise because S3 uploads must have a content-length header, so the only way to 'stream' an upload of truely unknown length is to initiate a multi-part upload.

TillaTheHun0 commented 2 years ago

+1 for this api. Right now, I am using putObject which accepts a Buffer or more broadly speaking, a Uint8Array, but I have to read the whole file into the buffer and send it.

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

danopia commented 2 years ago

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

Thank you for reporting your use-case!

So looking at upload(), that specific function is implemented by the AWS.S3.ManagedUpload class. It appears to chop your stream into individual 5MB segments and upload them with a Multipart Upload strategy. This is actually separate from streaming request bodies because each 'part' is buffered. So I will track managed/multipart uploads in a separate issue 😅

S3 only supports true streaming uploads if you will know the length of the body upfront. That's what this PR ⬆️ implements. Maybe you know your object size ahead of time, in which case you don't actually need the multipart upload() (but the parallelization probably still helps speeds with huge objects)

danopia commented 2 years ago

@TillaTheHun0 I have a working pass on chunked/parallelized S3 uploading in #31, please feel free to vet the behavior in your own stuff before I get it clean enough to merge. You can import { multiPartUpload } from "https://raw.githubusercontent.com/cloudydeno/deno-aws_api/89703336482f008f3f6ebd9f759370fd393ba362/lib/helpers/s3-upload.ts" and then call it with an S3 client as shown in #31's text. If you have any feedback please comment on that PR. Thanks again!

TillaTheHun0 commented 2 years ago

@danopia oh sweet, I will check it out! 👍

yogesnsamy commented 1 year ago

+1 for this api. Right now, I am using putObject which accepts a Buffer or more broadly speaking, a Uint8Array, but I have to read the whole file into the buffer and send it.

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

@TillaTheHun0 May I know how to supply a Buffer to putObject? Looking at the s3.file, I could only supply either one of the following values:

image
TillaTheHun0 commented 1 year ago

@yogesnsamy the code is here

Basically we're using readAllwhich accepts a Deno.Reader and reads the contents into a Uint8Array that we pass to putObject

yogesnsamy commented 1 year ago

@yogesnsamy the code is here

Basically we're using readAllwhich accepts a Deno.Reader and reads the contents into a Uint8Array that we pass to putObject

Many thanks @TillaTheHun0. It works.

danopia commented 1 year ago

Question for y'all, when calling PutObject with a Reader, do you know the byte size of your Reader upfront? That would make streaming easier to implement

danopia commented 1 year ago

🚀 A managed-upload module (using S3 multipart) just shipped in v0.8.0. This is most useful for uploading large files (50MB and up) as it will break up your ReadableStream<Uint8Array> into multiple S3 requests.

🗒️ Also, hot tip: in cases where you aren't worried about the file fitting into memory, you can use a Response to easily buffer a ReadableStream<Uint8Array>:

const bodyBuffer = new Uint8Array(await new Response(bodyStream).arrayBuffer());

🗑️ This pull request is still in draft and I don't think it's going to land this time because it's not very useful. True request streaming is only possible to S3 if the body's length is known upfront. That limitation means that this library cannot just accept a ReadableStream<Uint8Array> on its own. The library could buffer the stream up for you, but this would really be a lie, so I'm not immediately in favor of it 🤔