aws / aws-sdk-js-v3

Modularized AWS SDK for JavaScript.
Apache License 2.0
3.04k stars 569 forks source link

[S3 Client] Errors if the Body of PutObjectCommand is a Readable type. #4979

Open gh-andre opened 1 year ago

gh-andre commented 1 year ago

Checkboxes for prior research

Describe the bug

How is it possible that this major functionality is broken since 2021 and it is still not fixed, two years later?

https://github.com/aws/aws-sdk-js-v3/issues/2348

AWS S3 client reports the header that the AWS team sets internally, within the package, as "not implemented" (Transfer-Encoding/chunked). This makes it impossible to use the same AWS S3 package to implement getting stuff out of S3 and putting large stuff into S3, and one needs to concoct weird code for this using a completely different package for uploading (lib-storage). How is this making any sense?

SDK version number

@aws-sdk/client-s3, v3.370.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

Windows, Node v16.13.1

Reproduction Steps

See details instructions in the original issue:

https://github.com/aws/aws-sdk-js-v3/issues/2348

Observed Behavior

NotImplemented returned with Transfer-Encoding/chunked used within the package. For details see the original issue

https://github.com/aws/aws-sdk-js-v3/issues/2348

Expected Behavior

It is expected PutObjectCommandInput to work as designed and documented and be able to set Body to an instance of Readable.

Possible Solution

No response

Additional Information/Context

Supporting Readable for putting stuff into S3 is a major functionality. I don't get it how it can be broken for 2 years.

yenfryherrerafeliz commented 1 year ago

Hi @gh-andre, sorry to hear about your frustration. I can agree that this is an issue, but it seems to be a limitation with the service itself rather with the SDK. I presume this, because we can see here that the stream is correctly piped to the request. A not feasible workaround would be to know or calculate the length of the stream and provide that value in the parameter "Content-Length" of the command "PutObjectCommand", but I know this is not ideal because we do not know how huge the stream can be, and we will know its length just once is completely read.

I will bring this into discussion with the team to get more clarity and define next steps on this, and I will use this thread to provide updates.

Thanks!

Sample code to reproduce:

import {PutObjectCommand, S3Client} from "@aws-sdk/client-s3";
import https from "https";

const fetchFileFromUrl = async (imageUrl) => {
    return await new Promise((resolve) => {
        https.get(imageUrl, (response) => {
            resolve(response)
        });
    });
}
const client = new S3Client({
    region: "us-east-2",
});
const body = await fetchFileFromUrl("https://cdn.pixabay.com/photo/2023/05/23/18/12/hummingbird-8013214_1280.jpg");
const response = await client.send(new PutObjectCommand({
    Bucket: process.env.TEST_BUCKET,
    Key: process.env.TEST_KEY,
    Body: body,
}));

console.log(response)

Thanks!

gh-andre commented 1 year ago

@yenfryherrerafeliz Thank you for responding. I do hope that the team considers how important it is to handle streams and chunked transfer encoding.

it seems to be a limitation with the service itself rather with the SDK

I never used S3 REST API, only Python and TypeScript, but a quick search seems to indicate that it is doable. It might be something newer that has not been incorporated into the SDK yet, but the functionality seems to be there.

https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html

I also will note that from the the SDK user POV, it appears that the SDK initiates chunked encoding internally and then reports that it's not implemented. If Content-Length is mandatory, perhaps it should not even try using Transfer-Encoding and just report missing Content-Length.

I know this is not ideal because we do not know how huge the stream can be

Exactly. There are basically two cases. Your sample covers one of them. AWS S3 users basically need to either:

a) change their protocol to propagate content length along with the stream, so ContentLength can be set in PutObjectCommandInput, even before the stream is read entirely, or

b) read the entire stream locally and then engage S3Client to put it into S3, with content length specified.

Both require awkward changes in the app architecture in having to propagate content length through app protocols in the first case or store large files locally in the second one (some files I transfer are 1GB+ in size).

I'm hoping some action would be taken here to make GET and PUT operations symmetrical in how they use Readable and Writable, which would help a lot of apps using S3Client.

I also hope that AWS Node team would improve docs. As of now, things like S3Client.send are not documented at all, and it takes some experimentation to see if the promise returned is for setting up the stream or for the end of transfer, etc. Some of the other docs are difficult to use. Take Full Signature on this page as an example - it shoots off the right edge of the screen with types with just a tiny scroll bar to navigate.

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/preview/Package/-aws-sdk-client-s3/Class/S3Client/

I do appreciate your response and hope that the AWS S3 team takes time to figure out how to handle streams consistently between different S3 operations. Thank you for jumping in and for the insights.

martinslota commented 1 year ago

According to this comment on the older bug report and the SDK upgrade instructions, one should use @aws-sdk/lib-storage to upload streams to S3.

martinslota commented 1 year ago

According to https://github.com/aws/aws-sdk-js-v3/issues/2348#issuecomment-841002347 and the SDK upgrade instructions, one should use @aws-sdk/lib-storage to upload streams to S3.

I gave that a shot now but unfortunately I'm getting the same error:

Edit: I didn't test correctly. Using @aws-sdk/lib-storage worked out fine.

gh-andre commented 1 year ago

@martinslota Using two different packages for same functionality isn't the approach I would recommend for anyone, to be honest.

What makes this issue even more strange, given how long it's been around without any attention from the S3 team, is that this error is a self-inflicted problem - the S3 client internally uses some S3 REST API functionality that results in this error (chunked transfers) and then it reports this error to the user, as if it's the user that sent a bad header.

danursin commented 1 year ago

Glad to know I wasn't crazy when I hit this error. This is the first case I've fallen back to the aws-sdk v2 since switching to v3 for new apps.

This does work if you needed to do something similar and are willing to fall back to the v2 package. I'd much prefer to stick with the v3 package and have this resolved. I'm curious how the v2 implementation handles this. Haven't looked yet.

import { S3 } from "aws-sdk";

const s3 = new S3();

const stream = s3
    .getObject({
        Bucket: "source-bucket",
        Key: "example-key"
    })
    .createReadStream();

await s3
    .upload({
        Bucket: "destination-bucket",
        Body: stream,
        Key: "example-key'
    })
    .promise();

Of course, I know you could do a copy command for this bit. In reality I need to do some other processing line by line before uploading the real file again.

doaortu commented 4 months ago

On my case, it's just spits out InvalidAccessKeyId: The Access Key Id you provided does not exist in our records. error, but when I'm using plain text as a Body, the upload was successful. What a weird bug.

using @aws-sdk/lib-storage is working fine.