aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.96k stars 1.06k forks source link

Upload fails with InvalidChunkSizeError for large objects #3132

Open rohansuri opened 2 weeks ago

rohansuri commented 2 weeks ago

Describe the bug

Hello,

I'm trying a simple S3 PUT of an object of size 1MB. Through some reading of the code, I found that the sdk reads the request body at least thrice:

So I decided to:

For small objects (1KB), it works fine. For large object, of size 1MB, I see the following error:

Unable to parse ExceptionName: InvalidChunkSizeError Message: Only the last chunk is allowed to have a size less than 8192 bytes

Expected Behavior

The request should succeed even with custom configurations set that override the default checksum algorithm as well as payload signing policy.

Current Behavior

The PUT request fails with HTTP error code 403 stating InvalidChunkSizeError.

Reproduction Steps

TEST_F(S3Test, crc){
    Aws::SDKOptions options;
    Aws::InitAPI(options);

    Aws::S3::S3ClientConfiguration config;
    config.payloadSigningPolicy =
            Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never;
    auto client = std::make_unique<Aws::S3::S3Client>(config);

    Aws::S3::Model::PutObjectRequest request;
    request.SetBucket("s3-gtest");
    request.SetKey("crctest");
    // Use a checksum that is streaming and so doesn't require reading the
    // request body twice.
    request.SetChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::CRC32C);

    std::string s(1024*1024, 'a');
    std::stringstream ss;
    ss << s;
    auto stream = Aws::MakeShared<Aws::IOStream>(nullptr, ss.rdbuf());
    request.SetBody(stream);

    auto outcome = client->PutObject(request);
    if(!outcome.IsSuccess()){
        std::cout << "outcome err = " << outcome.GetError().GetMessage() << std::endl;
    }
    client.reset();
    Aws::ShutdownAPI(options);
}

stdout:

outcome err = Unable to parse ExceptionName: InvalidChunkSizeError Message: Only the last chunk is allowed to have a size less than 8192 bytes

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.411

Compiler and Version used

Apple clang version 15.0.0 (clang-1500.3.9.4)

Operating System and version

MacOS 14.4.1

rohansuri commented 2 weeks ago

@DmitriyMusatkin any thoughts?

jmklix commented 2 weeks ago

We are looking into this

rohansuri commented 2 weeks ago

Thanks for the update @jmklix

sbiscigl commented 2 weeks ago

@rohansuri I was able to replicate this consistenly, truly appreciate the complete reproduction. This issue is that in our curl client where we apply aws-chunking curl never guarantees that 8KB will be present in a chunk. This is a fundamental design flaw that we are working on ironing out right now. We are currently working on a checksumming code refactor to make this more straight forward and easier to follow. We are treating this with a high priority

rohansuri commented 2 weeks ago

Thank you @sbiscigl for looking into this!

Once the issue is fixed and the client is configured to:

is it ensured that the request body for PUT requests will only be streamed once?

Also is there a way for now to workaround this issue (maybe switching the internal HTTP client implementation that is being used)?

sgoth commented 1 week ago

I'm also seeing this error randomly with PUTs via TransferManager::UploadFile on 1.11.352. Not sure if TransferManager selects such a checksum algo internally - i don't set one explicitly.

Retrying makes it work sooner or later...

rohansuri commented 1 week ago

@sbiscigl any updates on this?

sgoth commented 1 week ago

Tried to see what checksumming is in use for me.

As i use contentMD5 = true, multipart uploads set it to NOT_SET - which is overwritten to MD5 internally as i learned here: https://github.com/aws/aws-sdk-cpp/issues/2968

Relevant code in TransferManager: https://github.com/aws/aws-sdk-cpp/blob/7c44fbe71e2abc18bb89e1baf8aa6ade78f6d21c/src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp#L390-L393

For SinglePart it seems completely off (probably a failed merge conflict resolution): https://github.com/aws/aws-sdk-cpp/blob/7c44fbe71e2abc18bb89e1baf8aa6ade78f6d21c/src/aws-cpp-sdk-transfer/source/transfer/TransferManager.cpp#L550-L553

First it is set to NOT_SET then again to TransferManagerConfig default CRC32

Is it currently possible to get reliable PUTs with any checksum/contentMD5 setting?

Edit: As i was testing with file of around 2MB in size this resulted in the SinglePart path. Explicitly setting the TransferManagerConfig.checksumAlgorithm to NOT_SET (seemingly) makes it work correctly. 100/100 uploads worked when before 1/3 did not - always the same file.

So not only CRC32C but also CRC32 seems problematic.

sbiscigl commented 1 week ago

@sbiscigl any updates on this?

We're refactoring a lot of how we are doing checksum to address continuing issues with checksumming that we see i.e. pull/3134 pull/3140

we're revisiting and re-working a lot of the code around checksums and S3.

Explicitly setting the TransferManagerConfig.checksumAlgorithm to NOT_SET (seemingly) makes it work correctly.

checksum algorithm NOT_SET will default to MD5 in the SDK. setting the transfer manager config to use contentMD5 will result it to use NOT_SET which in turn will use MD5.

So not only CRC32C but also CRC32 seems problematic.

Its not a specific checksum its aws-chunked. We are working at fixing it, and we're doing a hoslitic refactor to make it more user friendly for everyone as we understand its quite frustrating right now.

rohansuri commented 1 week ago

@sbiscigl thank you for the update.

rohansuri commented 1 week ago

As part of this, will we also look into permitting the user to completely turn off the checksum?

sbiscigl commented 1 week ago

As part of this, will we also look into permitting the user to completely turn off the checksum?

yes

sbiscigl commented 1 week ago

also merged your request @sgoth, good catch "probably a failed merge conflict resolution)" is correct, sorry about that, thank you for bringing it to our attention and helping us fix it.