aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.98k stars 1.06k forks source link

ensure WinSyncHttpClient aws-chunk size is at least 8 KB #2893

Closed aslip closed 7 months ago

aslip commented 8 months ago

Issue #, if available: N/A

Description of changes:

aws-chunked encoding documentation is pretty scarce, the best article that I could find - https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html - says the chunk size must be at least 8 KB.

In WinSyncHttpClient, chunk size essentially is defined by bytesToRead.

Before the fix, the client subtracted 8 bytes for chunk metadata from an already allocated 8 KB buffer, resulting in a chunk size of less than 8 KB (8192 - 8 = 8184 bytes).

And, for example, AWS S3 service endpoint indeed returned an error when SDK PutObject() made multiple chunk upload (if using flexible checksums and object size > 8184):

trace log snippet ``` [TRACE] 2024-03-17 21:38:18.657 WinHttpSyncHttpClient [6492] Making PUT request to uri https://put-object-checksum2.s3.eu-west-2.amazonaws.com/object-name ... [DEBUG] 2024-03-17 21:38:18.658 WinHttpSyncHttpClient [6492] with headers: [DEBUG] 2024-03-17 21:38:18.658 WinHttpSyncHttpClient [6492] amz-sdk-invocation-id: 892204CA-DD86-43A6-8CE9-59BFCEBE55B6 amz-sdk-request: attempt=1 authorization: AWS4-HMAC-SHA256 Credential=.../20240317/eu-west-2/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-type;host;transfer-encoding;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-sdk-checksum-algorithm;x-amz-security-token;x-amz-trailer, Signature=398bd12331cc2fce53b0bd98bb399836811e34ae2020f363c018a85fa7e116e5 content-encoding: aws-chunked content-type: binary/octet-stream host: put-object-checksum2.s3.eu-west-2.amazonaws.com transfer-encoding: chunked user-agent: aws-sdk-cpp/1.11.273 ua/2.0 md/aws-crt#0.26.2 os/Windows#10.0.20348.1070 md/arch#AMD64 lang/c++#C++199711L md/MSVC#1929 cfg/retry-mode#default exec-env/EC2 api/S3 x-amz-content-sha256: STREAMING-UNSIGNED-PAYLOAD-TRAILER x-amz-date: 20240317T213818Z x-amz-decoded-content-length: 8185 x-amz-sdk-checksum-algorithm: CRC32C x-amz-security-token: ... x-amz-trailer: x-amz-checksum-crc32c [DEBUG] 2024-03-17 21:38:18.677 WinHttpSyncHttpClient [6492] Received response code 403 [DEBUG] 2024-03-17 21:38:18.677 WinHttpSyncHttpClient [6492] Received content type application/xml [DEBUG] 2024-03-17 21:38:18.677 WinHttpSyncHttpClient [6492] Received headers: [DEBUG] 2024-03-17 21:38:18.677 WinHttpSyncHttpClient [6492] HTTP/1.1 403 Forbidden Connection: close Date: Sun, 17 Mar 2024 21:38:18 GMT Transfer-Encoding: chunked Content-Type: application/xml Server: AmazonS3 x-amz-request-id: HAHMMX0XV8YV55BE x-amz-id-2: f3XrHIrUlDTkGMQtN9/WYDrbJzmWBfBKO4H7GIfvDp9NP6UP1hNUBTmXlfWPxDVaH9XwssbHOAX2ld5gYf4rtg== ... [DEBUG] 2024-03-17 21:38:18.678 AWSClient [6492] Request returned error. Attempting to generate appropriate error codes from response [TRACE] 2024-03-17 21:38:18.678 AWSErrorMarshaller [6492] Error response is InvalidChunkSizeError Only the last chunk is allowed to have a size less than 8192 bytes 2 8184 HAHMMX0XV8YV55BE f3XrHIrUlDTkGMQtN9/WYDrbJzmWBfBKO4H7GIfvDp9NP6UP1hNUBTmXlfWPxDVaH9XwssbHOAX2ld5gYf4rtg== [WARN] 2024-03-17 21:38:18.679 AWSErrorMarshaller [6492] Encountered Unknown AWSError 'InvalidChunkSizeError': Only the last chunk is allowed to have a size less than 8192 bytes [ERROR] 2024-03-17 21:38:18.679 AWSXmlClient [6492] HTTP response code: 403 Resolved remote host IP address: Request ID: HAHMMX0XV8YV55BE Exception name: InvalidChunkSizeError Error message: Unable to parse ExceptionName: InvalidChunkSizeError Message: Only the last chunk is allowed to have a size less than 8192 bytes 7 response headers: connection : close content-type : application/xml date : Sun, 17 Mar 2024 21:38:18 GMT server : AmazonS3 transfer-encoding : chunked x-amz-id-2 : f3XrHIrUlDTkGMQtN9/WYDrbJzmWBfBKO4H7GIfvDp9NP6UP1hNUBTmXlfWPxDVaH9XwssbHOAX2ld5gYf4rtg== x-amz-request-id : HAHMMX0XV8YV55BE ```

Proposed fix adds 8 bytes at the moment of buffer allocation.

Because WinSyncHttpClient::StreamPayloadToRequest() currently handles aws-chunked encoding, its streamBuffer should be prepared to handle both chunked and non-chunked payloads. A tradeoff for non-chunked is extra 8 bytes allocated on the stack, addressing this using alloca() or vector would create more problems than it solves.


Possible alternative is to use multiple DoWriteData() calls or even move aws-chunked content encoding handling to DoWriteData(), as is already done for chunked transfer encoding. But this would be a bigger change.


Another alternative, easier to implement but probably harder to justify, is simply:

-static const uint32_t HTTP_REQUEST_WRITE_BUFFER_LENGTH = 8192;
+static const uint32_t HTTP_REQUEST_WRITE_BUFFER_LENGTH = 65536;

Basically, use any buffer length that’s bigger than 8 KB + chunked metadata length. This is how CURL client works despite having the same logic. I just took 64 KB because it’s recommended in the linked AWS doc.


I think this can be tested only with integration tests. Personally, I tested with modified BucketAndObjectOperationTest.TestFlexibleChecksums. If it’s ok then I can add that to this PR as well.


Check all that applies:

Check which platforms you have built SDK on to verify the correctness of this PR.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.