aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.16k stars 833 forks source link

SignerUtils#moveContentLength Opens a Stream Unnecessarily #5444

Open RealTYPICAL opened 1 month ago

RealTYPICAL commented 1 month ago

Describe the bug

SignerUtils#moveContentLength conditionally has to read the entire contents of an input stream in case a content length header isn't there. This is problematic because:

Expected Behavior

The stream should only be opened if the content length needs to be computed.

Current Behavior

It will open a stream and then close it for no purpose.

Reproduction Steps

AtomicBoolean opened = new AtomicBoolean(false);
s3Client.putObject(b -> b.bucket("bucket").key("key"), RequestBody.fromContentProvider(() -> {
    if (opened.getAndSet(true)) {
        throw new RuntimeException("Could only be opened once.");
    }
    return new ByteArrayInputStream("a".repeat(10).getBytes());
}, 10, "application/octet-stream"));

Possible Solution

SignerUtils#moveContentLength should take a ContentStreamProvider and only open the stream if it's necessary for computing the content length.

Additional Information/Context

No response

AWS Java SDK version used

2.26.22

JDK version used

openjdk version "21.0.1"

Operating System and version

Windows 10 and WSL2

debora-ito commented 1 month ago

@RealTYPICAL

Content-length is required for S3 PutObject, that's why the SDK will read the entire stream when no content-length is provided.

and only open the stream if it's necessary for computing the content length.

How do you know if it's necessary? Are you using a third-party solution that integrates with s3? Can you tell us more about your use case?

RealTYPICAL commented 1 month ago

Hi debora-tio,

Thanks for responding.

In the snippet I provided in the description, the content-length is already given as the second argument to fromContentProvider. There is a second factory method on RequestBody with the signature RequestBody#fromContentProvider(ContentStreamProvider, String) which would then need to compute the content-length from the InputStream returned from the ContentStreamProvider. This issue revolves around the fact that the stream is opened even though the content-length is already there.

This is the relevant code from SignerUtils.moveContentLength with some changes demonstrating what I mean:

public static long moveContentLength(SdkHttpRequest.Builder request, ContentStreamProvider payload /*Used to be `InputStream payload`*/) {
        Optional<String> decodedContentLength = request.firstMatchingHeader(X_AMZ_DECODED_CONTENT_LENGTH);
        if (!decodedContentLength.isPresent()) {
            String contentLength = request.firstMatchingHeader(Header.CONTENT_LENGTH).orElseGet(
                () -> String.valueOf(readAll(payload)) //<--- now the stream will only be opened if the content length header is missing
            );

            request.putHeader(X_AMZ_DECODED_CONTENT_LENGTH, contentLength)
                .removeHeader(Header.CONTENT_LENGTH);
            return Long.parseLong(contentLength);
        }

        request.removeHeader(Header.CONTENT_LENGTH);
        return Long.parseLong(decodedContentLength.get());
    }

    private static int readAll(ContentStreamProvider provider) {
        //stream only opened here
        try (InputStream inputStream = provider.newStream()) {
            byte[] buffer = new byte[4096];
            int read = 0;
            int offset = 0;
            while (read >= 0) {
                read = inputStream.read(buffer);
                if (read >= 0) {
                    offset += read;
                }
            }
            return offset;
        } catch (Exception e) {
            throw new RuntimeException("Could not finish reading stream: ", e);
        }
    }