aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.07k stars 573 forks source link

Please update guidance on how to stream result.Body from GetObjectCommand on NodeJS #5582

Closed codypenta closed 5 days ago

codypenta commented 9 months ago

Describe the issue

This works up to a certain file size.

try {
    const result = await this._client.s3Client.send(
        new GetObjectCommand(params),
    );

       //...

    // Avoid this, this loads the entire file into memory which invokes
    // the OOM killer on linux for large files
    fs.writeFileSync(this._to, await result.Body.transformToByteArray());

       //...
} 

It looks like the web ecosystem has .transformToWebStream(), which is not compatible with nodejs createReadStream and createWriteStream. (hence webstream). This looks to be a regression because older versions of the SDK allowed this.

Links

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/s3/command/GetObjectCommand/

codypenta commented 9 months ago

It seems webstream is compatible with nodejs....but not with node streams. https://nodejs.org/api/webstreams.html

codypenta commented 9 months ago

I was able to get it working on NodeJS 20 with SDK version "@aws-sdk/client-s3": "^3.454.0", by building a "wrapper" around createWriteStream.

const result = await this._client.s3Client.send(
    new GetObjectCommand(params),
);

// Avoid this, this loads the entire file into memory which invokes
// the OOM killer on linux (and therefore lambda) for large files
// fs.writeFileSync(this._to, await result.Body.transformToByteArray());

const nodeWriteStream = createWriteStream(this._to, "binary");
const stream = new WritableStream({
    write(chunk) {
        nodeWriteStream.write(chunk);
    },
    close() {
        nodeWriteStream.close();
    },
    abort(err) {
        nodeWriteStream.destroy(err);
        throw err;
    },
});

// You cannot await just the pipeTo() because you must wait for
// both pipeTo AND createWriteStream to finish.
await new Promise((resolve, reject) => {
    nodeWriteStream.on('finish', resolve);
    nodeWriteStream.on('error', reject);
    result.Body?.transformToWebStream().pipeTo(stream);
});
aBurmeseDev commented 9 months ago

Hi @codypenta - apologies for not getting to you sooner but glad you got it working. Let us know if there's anything else we could help with.

codypenta commented 9 months ago

Can we update the docs (maybe specifically in examples) to include this or provide a built-in streaming mechanism on the nodejs side of the house to hide this detail. As of now, the user experience when trying to do streaming on nodejs with the latest versions of the sdk seems to only be captured across github issues and not AWS docs.

bpottier commented 4 months ago

@codypenta I get an empty file when trying your solution. It seems there is A LOT of confusion about how to accomplish this with the latest SDK. Can we get some more input?

Edit: wasn't awaiting the promise 🤦but it would still be nice to get some better docs on this or a built-in method like @codypenta mentioned.

aBurmeseDev commented 1 week ago

Refer to this session on README on handling streams: https://github.com/aws/aws-sdk-js-v3?tab=readme-ov-file#streams