awslabs / aws-crt-java

Java bindings for the AWS Common Runtime
Apache License 2.0
57 stars 39 forks source link

uploadFile fails with CRT S3TransferManager #691

Closed kjc14f closed 4 months ago

kjc14f commented 1 year ago

Describe the bug

When using the S3 CRT client with the S3TransferManager, file uploads don't seem to work. Previously this was working okay when using other S3AsyncClient however when switching to the CRT one, when the uploadFile() method is called an error is recieved regarding a SHA256 mismtach.

I have not specififed a SHA256 hash anywhere though, in fact I have manually specified a CRC32C hash. If I set checksumValidationEnabled(false) on the async client the error goes away however I don't want to disable this validation.

The file I am trying to upload is a very small 1KB txt file which is static on my file system so I can confirm it is not being modified at the time of the upload etc.

My client uses some custom options as seen below such as overrideEndpoint(), forcePathStyle() and trustAllCertificatesEnabled() which might be a factor.

Expected Behavior

File is uploaded to bucket

Current Behavior

The following error is recieved: The Content-SHA256 you specified did not match what we receieved

Reproduction Steps

Create a CRT async client and a TransferManager using that client. Then create an UploadFileRequest similar to below. Finally call uploadFile()

S3AsyncClient client = S3AsyncClient.crtBuilder()
        .endpointOverride(<endpoint>)
        .forcePathStyle(true)
        .httpConfiguration(e -> e.trustAllCertificatesEnabled(true))
        .credentialsProvider(StaticCredentialsProvider.create(<creds>))
        .region(Region.of(<region>))
        .build();

S3TransferManager transferManager = S3TransferManager.builder()
        .s3Client(client)
        .build();

UploadFileRequest uploadFileRequest = UploadFileRequest.builder()
        .putObjectRequest(e -> e
                .bucket("bucket")
                .key("key")
                .metadata(Map.of("test", "test"))
                                .contentType("text/plain")
                .checksumAlgorithm(ChecksumAlgorithm.CRC32_C)
                .checksumCRC32C("123"))
        .source(new File(""))
        .build();

FileUpload upload = transferManager.uploadFile(uploadFileRequest);
upload.completionFuture().join();

I tried simplifying the UploadFileRequest to the following (with no CRC32C params) however this didn't seem to have any effect:

UploadFileRequest uploadFileRequest = UploadFileRequest.builder()
        .putObjectRequest(e -> e
                .bucket("bucket")
                .key("key"))
        .source(new File(""))
        .build();

Possible Solution

No response

Additional Information/Context

No response

aws-crt-java version used

0.27.3

Java version used

17.0.8

Operating System and version

Windows 10 1909

kjc14f commented 1 year ago

After some more playing around it seems that with checksumCRC32C() I get a different error which is different from what I was getting before so perhaps the IDE hadn't rebuilt the class.

With checksumCRC32C() set on the UploadFileRequest = SdkClientException: Unable to execute HTTP request: S3Client.aws_s3_client_make_meta_request: creating aws_s3_meta_request failed (aws_last_error: AWS_ERROR_INVALID_ARGUMENT(34), An invalid argument was passed to a function.)

With checksumCRC32C() removed on the UploadFileRequest = The Content-SHA256 you specified did not match what we receieved

So I guess the CRT client may not support setting the CRC32C checksum manually? The issue still remains either way though where I can't seem to upload the file.

EDIT: Since it was talking about the SHA256 hash I tried setting checksumSHA256() on the UploadFileRequest and that gives the same AWS_ERROR_INVALID_ARGUMENT(34) as above. So it seems any attempt to set the checksum manually causes this error and any attempt to change the checksumAlgorithm() is ignored in favour of SHA256

kjc14f commented 1 year ago

I continued playing around with this issue a little and the issue seems a little more fundemental than I first thought. I tried removing the S3TransferManager completely and just doing a super simple upload using a CRT client as per the documentation example here. The example below also gives the error The Content-SHA256 you specified did not match what we receieved preventing upload.

S3AsyncClient client = S3AsyncClient.crtBuilder()
        .endpointOverride(<endpoint>)
        .forcePathStyle(true)
        .httpConfiguration(e -> e.trustAllCertificatesEnabled(true))
        .credentialsProvider(StaticCredentialsProvider.create(<creds>))
        .region(Region.of(<region>))
        .build();

PutObjectResponse putObjectResponse = 
      client.putObject(req -> req.bucket(<BUCKET_NAME>)
                                   .key(<KEY_NAME>),
                        AsyncRequestBody.fromFile(Paths.get(<FILE_NAME>)))
              .join();
DmitriyMusatkin commented 1 year ago

Content-SHA256 is part of sigv4 body signing (https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html) and is different from flexible checksums like CRC32. SDK will typically not sign body if request is over https.

Can you provide more details on what you are overriding endpoint to? Is it pointing to an alternative implementation of s3 interface? is it going over http or https?

Also enabling crt logging might help with debugging whats going on (https://github.com/awslabs/aws-crt-java/blob/main/src/main/java/software/amazon/awssdk/crt/Log.java#L195).

kjc14f commented 1 year ago

Thanks for the info, that's useful! The overriding endpoint points to an S3 compatible instance and is HTTPS. It uses a cert that I can't add to my OS truststore at the moment which is why the trustAllCertificatesEnabled() comes into play.

I tried the request with trace logs as you suggested. Initially it says UNSIGNED-PAYLOAD then it says: Http request successfully built final authorization value via algorithm SigV4, with contents. It then states a Signature= however the value of that signature appears to be different from the SHA256 hash I get for the file when I hash it using Guava. Presumably it is the Guava hash value the server is expecting which is causing the Content-SHA256 mismatch error?

DmitriyMusatkin commented 1 year ago

are you seeing x-amz-content-sha256 header set to UNSIGNED-PAYLOAD? If yes, then it should be correct behavior that indicates body is not signed as part of sigv4.

Signature is an hmac-sha256 of several fields in the request as described in https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html and it will not match sha256 of the body.

Before debugging any further can you confirm that regular s3 client request works fine with the s3 instance you are using? Since you are sending 1kb of data, the request that transfer manager sends should be nearly identical to the request that regular s3 client will send.

kjc14f commented 1 year ago

Apologies, of course it makes sense for that SHA-256 to be different.

As suggested, I carried out some tests with debug logs on other clients.

DmitriyMusatkin commented 1 year ago

So for S3Client and Transfer Manager with S3AsyncClient it falls back to sending file using a single chunk, which is indicated by x-amz-content-sha256:UNSIGNED-PAYLOAD. That makes sense, since typically SDKs consider anything <64kb to be a single chunk.

CRT defaults to uploading files using whats called aws-chunked encoding when flexible checksum like crc32 is specified (https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html). With that encoding body is still unsigned but x-amz-content-sha256:STREAMING-UNSIGNED-PAYLOAD-TRAILER is used to indicate that its streaming data.

With regular s3 there is no difference between the 2 flows. But if wondering if there is some difference in behavior with the s3 instance you are using.

Can you try sending file thats bigger than 64kb using S3Client and TransferManager? that should trigger the upload to use aws-chunked encoding.

kjc14f commented 1 year ago

When repeating the tests with an ~20MB file, the behviour seems to be as follows: For test 1 & 2, behaviour seems the same. Both uploads succeed. For test 3, the upload fails with the same SHA256 exception as previous. I can see it initially says x-amz-content-sha256:UNSIGNED-PAYLOAD which was different to before. It then goes on to say Signing successfully built string-to-sign via algorithm SigV4, with contents AWS4-HMAC-SHA256 followed by a timestamp, path and hash which was the same behaviour as before. Interstingly, it does print The Content-SHA256 you specified did not match what we receieved 3 times before it prints the block where it prints the x-amz-content-sha256 header which it didn't do before. It then prints that error one last time as the request fails

Johny-Ch commented 9 months ago

We ran into the same issue with S3CrtAsyncClient, do we have a workaround or fix in any of the latest versions for this error?

jmklix commented 7 months ago

@kjc14f and @Johny-Ch are you using this with aws s3 buckets or are you using a similar service that might be behaving slightly differently?

github-actions[bot] commented 7 months ago

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

Johny-Ch commented 7 months ago

@kjc14f and @Johny-Ch are you using this with aws s3 buckets or are you using a similar service that might be behaving slightly differently?

I'm using a similar service and hence this issue I suppose

jmklix commented 4 months ago

We can't fix/change a third party service, but if you can reproduce this with AWS services then we can work on a fix that might fix both. Please let me know if you can reproduce this with AWS s3, otherwise there is not much I can do other than close this issue

github-actions[bot] commented 4 months ago

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.