awslabs / aws-sdk-kotlin

Multiplatform AWS SDK for Kotlin
Apache License 2.0
416 stars 49 forks source link

S3Client is uploading files with empty Content-Encoding metadata #1358

Closed turboDi closed 4 months ago

turboDi commented 4 months ago

Describe the bug

S3Client is uploading files with empty Content-Encoding metadata if contentEncoding parameter is not specified

Expected behavior

If contentEncoding is not specified, S3Client shouldn't add empty Content-Encoding metadata. I expect the same result as this aws cli call:

aws s3 cp test.zip s3://test-bucket/test.zip

image

Current behavior

If contentEncoding is not specified, S3Client adds empty Content-Encoding metadata

image

Steps to Reproduce

class S3ClientTest {

    @Test
    fun test(): Unit = runBlocking {
        useTempFile { file ->
            javaClass.getResourceAsStream("/test.zip")!!.use {
                it.copyTo(file.outputStream())
            }

            S3Client {
                region = "us-east-2"
            }.use { client ->
                client.putObject {
                    bucket = "test-bucket"
                    key = "test.zip"
                    body = file.asByteStream()
                    contentType = "application/zip"
                }
            }
        }
    }
}

Possible Solution

No response

Context

When files with empty Content-Encoding are downloaded back from s3 with direct links or other clients like ktor, they fail on empty Content-Encoding header. Have to add hacks to treat empty Content-Encoding as identity. But for clients, that I don't have control over, there is no workaround.

AWS SDK for Kotlin version

1.2.50

Platform (JVM/JS/Native)

JVM

Operating system and version

MacOS, Amazon Linux 2

ianbotsf commented 4 months ago

Hi @turboDi, thanks for the bug report! Unfortunately, I cannot reproduce this locally on SDK version 1.2.50. I've uploaded several objects to S3 with no contentEncoding parameter specified and they appear in S3 without that metadata included.

Can you please enable request logging and post the trace log of the API call with any sensitive information redacted?

Example of enabling request logging:

S3Client {
    region = "us-west-2"
    logMode = LogMode.LogRequest // Log request headers but not the body
}
turboDi commented 4 months ago

Hi @ianbotsf, thanks for a quick response! Here is a trace log for the test


2024-07-12 19:31:13.929 TRACE [main @coroutine#1] a.s.k.r.h.o.OperationHandler operation started ||
2024-07-12 19:31:14.027 TRACE [main @coroutine#1] a.s.k.r.a.a.CachedCredentialsProvider refreshing credentials cache ||
2024-07-12 19:31:14.029 TRACE [main @coroutine#1] a.s.k.r.i.IdentityProviderChain Attempting to resolve identity from aws.sdk.kotlin.runtime.auth.credentials.SystemPropertyCredentialsProvider@3b152928 ||
2024-07-12 19:31:14.031 TRACE [main @coroutine#1] a.s.k.r.a.c.SystemPropertyCredentialsProvider Attempting to load credentials from system properties aws.accessKeyId/aws.secretAccessKey/aws.sessionToken ||
2024-07-12 19:31:14.032 DEBUG [main @coroutine#1] a.s.k.r.i.IdentityProviderChain unable to resolve identity from aws.sdk.kotlin.runtime.auth.credentials.SystemPropertyCredentialsProvider@3b152928: Missing value for system property `aws.accessKeyId` ||
2024-07-12 19:31:14.034 TRACE [main @coroutine#1] a.s.k.r.i.IdentityProviderChain Attempting to resolve identity from aws.sdk.kotlin.runtime.auth.credentials.EnvironmentCredentialsProvider@a0a9fa5 ||
2024-07-12 19:31:14.035 TRACE [main @coroutine#1] a.s.k.r.a.c.EnvironmentCredentialsProvider Attempting to load credentials from env vars AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKEN ||
2024-07-12 19:31:14.040 DEBUG [main @coroutine#1] a.s.k.r.h.o.AuthHandler resolved endpoint: Endpoint(uri=https://test-bucket.s3.us-east-2.amazonaws.com, headers=null, attributes={AttributeKey(aws.smithy.kotlin#endpointAuthSchemes)=[AuthOptionImpl(schemeId=AuthSchemeId(id=aws.auth#sigv4), attributes={AttributeKey(aws.smithy.kotlin.signing#AwsSigningRegion)=us-east-2, AttributeKey(aws.smithy.kotlin.signing#AwsSigningService)=s3, AttributeKey(aws.smithy.kotlin.signing#UseDoubleUriEncode)=false})]}) ||
2024-07-12 19:31:14.042 DEBUG [main @coroutine#1] a.s.k.r.h.i.FlexibleChecksumsRequestInterceptor no checksum algorithm specified, skipping flexible checksums processing ||
2024-07-12 19:31:14.058 TRACE [main @coroutine#1] a.s.k.r.a.a.DefaultAwsSignerImpl Canonical request:
PUT
/test.zip
x-id=PutObject
amz-sdk-invocation-id:d6be1597-6d5d-4c4b-9fad-457dc050acfe
amz-sdk-request:attempt=1; max=3
content-encoding:aws-chunked
content-type:application/zip
host:test-bucket.s3.us-east-2.amazonaws.com
transfer-encoding:chunked
x-amz-content-sha256:STREAMING-AWS4-HMAC-SHA256-PAYLOAD
x-amz-date:20240712T153114Z
x-amz-decoded-content-length:1307372
x-amz-user-agent:aws-sdk-kotlin/1.2.50

amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-type;host;transfer-encoding;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-user-agent
STREAMING-AWS4-HMAC-SHA256-PAYLOAD ||
2024-07-12 19:31:14.060 TRACE [main @coroutine#1] a.s.k.r.a.a.DefaultAwsSignerImpl String to sign:
AWS4-HMAC-SHA256
20240712T153114Z
20240712/us-east-2/s3/aws4_request
22330e7b0542e58dd369be91282b100edaf70bb3748ae982b4b54c3a9125bfd5 ||
2024-07-12 19:31:14.060 DEBUG [main @coroutine#1] a.s.k.r.a.a.DefaultAwsSignerImpl Calculated signature: <REDACTED> ||
2024-07-12 19:31:14.077 DEBUG [main @coroutine#1] httpTraceMiddleware HttpRequest:
PUT /test.zip?x-id=PutObject
Host: test-bucket.s3.us-east-2.amazonaws.com
Content-Type: application/zip
User-Agent: aws-sdk-kotlin/1.2.50 ua/2.1 api/s3#1.2.50 os/macos#13.6.7 lang/kotlin#1.9.23 md/javaVersion#17.0.9 md/jvmName#OpenJDK_64-Bit_Server_VM md/jvmVersion#17.0.9+0 m/E
x-amz-user-agent: aws-sdk-kotlin/1.2.50
amz-sdk-invocation-id: d6be1597-6d5d-4c4b-9fad-457dc050acfe
amz-sdk-request: attempt=1; max=3
Content-Encoding: aws-chunked
Transfer-Encoding: chunked
X-Amz-Decoded-Content-Length: 1307372
X-Amz-Content-Sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD
X-Amz-Date: 20240712T153114Z
Authorization: AWS4-HMAC-SHA256 Credential=<REDACTED>/20240712/us-east-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-type;host;transfer-encoding;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-user-agent, Signature=<REDACTED>

 ||
2024-07-12 19:31:14.107 TRACE [DefaultDispatcher-worker-1 @call-context#2] a.s.k.r.h.e.o.OkHttpEngine call started ||
2024-07-12 19:31:14.119 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine proxy select start: url=https://test-bucket.s3.us-east-2.amazonaws.com/ ||
2024-07-12 19:31:14.122 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine proxy select end: url=https://test-bucket.s3.us-east-2.amazonaws.com/; proxies=[DIRECT] ||
2024-07-12 19:31:14.122 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine dns query: domain=test-bucket.s3.us-east-2.amazonaws.com ||
2024-07-12 19:31:14.200 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine dns resolved: domain=test-bucket.s3.us-east-2.amazonaws.com; records=[test-bucket.s3.us-east-2.amazonaws.com/52.219.179.42, test-bucket.s3.us-east-2.amazonaws.com/3.5.128.171, test-bucket.s3.us-east-2.amazonaws.com/3.5.130.118, test-bucket.s3.us-east-2.amazonaws.com/3.5.132.120, test-bucket.s3.us-east-2.amazonaws.com/3.5.131.123, test-bucket.s3.us-east-2.amazonaws.com/3.5.128.100, test-bucket.s3.us-east-2.amazonaws.com/3.5.129.118, test-bucket.s3.us-east-2.amazonaws.com/52.219.92.194] ||
2024-07-12 19:31:14.206 TRACE [OkHttp connect https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine starting connection: addr=test-bucket.s3.us-east-2.amazonaws.com/52.219.179.42:443; proxy=DIRECT ||
2024-07-12 19:31:14.408 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine initiating TLS connection ||
2024-07-12 19:31:14.762 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine TLS connect end: handshake=Handshake{tlsVersion=TLS_1_3 cipherSuite=TLS_AES_128_GCM_SHA256 peerCertificates=[CN=*.s3.us-east-2.amazonaws.com, CN=Amazon RSA 2048 M01, O=Amazon, C=US, CN=Amazon Root CA 1, O=Amazon, C=US] localCertificates=[]} ||
2024-07-12 19:31:14.765 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine connection established: addr=test-bucket.s3.us-east-2.amazonaws.com/52.219.179.42:443; proxy=DIRECT; protocol=http/1.1 ||
2024-07-12 19:31:14.767 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine connection acquired: conn(id=1330540524)=Connection{test-bucket.s3.us-east-2.amazonaws.com:443, proxy=DIRECT hostAddress=test-bucket.s3.us-east-2.amazonaws.com/52.219.179.42:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=http/1.1}; connPool: total=1, idle=0 ||
2024-07-12 19:31:14.770 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine sending request headers ||
2024-07-12 19:31:14.771 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine finished sending request headers ||
2024-07-12 19:31:14.772 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine sending request body ||
2024-07-12 19:31:15.947 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine finished sending request body: bytesSent=1309257 ||
2024-07-12 19:31:16.333 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine response headers start ||
2024-07-12 19:31:16.334 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine response headers end: contentLengthHeader=0 ||
2024-07-12 19:31:16.336 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine response body available ||
2024-07-12 19:31:16.336 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine response body finished: bytesConsumed=0 ||
2024-07-12 19:31:16.337 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine connection released: conn(id=1330540524)=Connection{test-bucket.s3.us-east-2.amazonaws.com:443, proxy=DIRECT hostAddress=test-bucket.s3.us-east-2.amazonaws.com/52.219.179.42:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=http/1.1}; connPool: total=1, idle=1 ||
2024-07-12 19:31:16.338 TRACE [OkHttp https://test-bucket.s3.us-east-2.amazonaws.com/...] a.s.k.r.h.e.o.OkHttpEngine call complete ||
2024-07-12 19:31:16.344 DEBUG [main @coroutine#1] httpTraceMiddleware HttpResponse: 200: OK ||
2024-07-12 19:31:16.366 TRACE [main @coroutine#1] a.s.k.r.h.o.OperationHandler operation completed successfully ||
ianbotsf commented 4 months ago

Ah, I see why I wasn't able to reproduce this: I was using too small of a test object!

Since you're uploading an object over 1MB, the SDK is automatically using chunked encoding. This feature allows streaming the payload to S3 without first reading it to calculate the request signature.

Unfortunately, it appears using chunked encoding has a side effect. From the page linked above:

If you specify Content-Encoding in your request as Content-Encoding : aws-chunked, S3 adds an empty value for Content-Encoding and stores the object metadata (Content-Encoding :) to the resulting object.

One way to get around this is to disable the use of chunked encoding on your client:

S3Client {
    region = "us-west-2"
    enableAwsChunked = false
}

What kind of errors/problems are you seeing when downloading S3 objects with Content-Encoding :?

turboDi commented 4 months ago

I see, thank you for detailed answer, enableAwsChunked did the trick.

I'm using ktor http client to download s3 objects with presigned url. This presigned url returns Content-Encoding : header and ktor doesn't support this out of the box. I can tweak my ktor client to support this empty encoding, but that looked like a bug to me and made me think that our other clients might be confused with this non-standard header value.

Anyway, this is not an sdk issue and I think we can close it

ianbotsf commented 4 months ago

I found a cross-SDK issue about this: https://github.com/aws/aws-sdk/issues/498. I'm checking internally to see what the status with S3 is but there may actually be a bug here which could be fixed by S3 or the SDKs.

ianbotsf commented 4 months ago

I've confirmed that this is indeed an S3 bug and a fix is being worked on which would address the issue independent of the SDK. I don't have an estimated timeline to share so, in the meantime, you'll have to either:

Please track https://github.com/aws/aws-sdk/issues/498 for updates on the S3 fix.

github-actions[bot] commented 4 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

ianbotsf commented 1 week ago

This issue has now been resolved by S3. Going forward, uploading objects via chunked encoding will no longer result in storing or retrieving Content-Encoding:. The service documentation has also been updated to reflect this:

Amazon S3 stores the resulting object without the aws-chunked value in the content-encoding header. If aws-chunked is the only value that you pass in the content-encoding header, S3 considers the content-encoding header empty and does not return this header when your retrieve the object.

For the Kotlin SDK specifically, it should no longer be necessary to disable chunked encoding via enableAwsChunked = false.