Open richardnpaul opened 3 weeks ago
Thanks for reaching out. In your upload_part request have you tried setting the ChecksumAlgorithm
to CRC32C
and specifying a string for ChecksumCRC32C
? You could also try another approach like using put_object, although it was noted that installing the CRT was required. Otherwise if you want to share your debug logs (with any sensitive info redacted) by adding boto3.set_stream_logger('')
to your script then we could investigate this further.
Hi @tim-finnigan, thanks for getting back in touch so quickly.
We did try the ChecksumAlgorithm set to CRC32C approach, which required then setting the x-amz-sdk-checksum-crc32c
header I believe, but we were getting an error with using this method too (I'll need to check the docs again, but we were reading these where we were following the points for the REST API rather than the SDK, and I'll need to check with the person that was testing this with me tomorrow.) The code for this is abstracted out behind a set of APIs and a calling CLI (not Python based) installed by our end users.
Thus our workflow is this, CLI calls Initiate endpoint to initiate an upload. On success the CLI can then call a generate pre-signed URLs endpoint which should take the parts and the checksums and return the part numbers with the pre-signed URLs for those parts (and this is the call which is using generate_presigned_url
with the upload_part
client method.) At this point the CLI uses the pre-signed URLs to PUT the file parts directly to S3 with the CRC32C checksum in the header and once that's complete it can call a complete endpoint submitting the parts, ETags and CRC32C checksums.
So with the description above out of the way, put_object
is not suitable for our workflow because the end users are using the CLI package; which is also the why of the need to use pre-signed URLs.. Sorry for the confusion that might have led you to suggest this as the above code was just a minimal amount of boiler plate code to duplicate the issue that we were seeing.
I will note that we do already have awscrt
as part of our dependency chain.
We have run this through successfully by removing the need for the checksums and it all works, so worst case we could fall back to the historic way of doing this using ContentMD5 but we were hoping to use the same approach that we're using for smaller unitary uploads which uses presigned_POST
which we do have working with the CRC32C checksums; I'm well aware that we seem to be on the outer fringes of what we're trying to achieve here with boto so all help is greatly appreciated.
I've done some testing today and here's a table of what I get back from the put to s3. So I've tested every combination of the ChecksumAlgorithm
and ChecksumCRC32C
on the upload_part side and x-amz-checksum-crc32c
and x-amz-sdk-checksum-algorithm
on the PUT headers side of things (we didn't get any different results with passing content-type and/or content-length as well as these):
▼headers/params► | Nothing | ChecksumCRC32C Only | ChecksumAlgorithm Only | Both |
---|---|---|---|---|
Nothing | 200 | 403: SignatureDoesNotMatch*1 | 403: SignatureDoesNotMatch*1 | 403: SignatureDoesNotMatch*1 |
x-amz-checksum-crc32c | 403: AccessDenied*2 | 400: InvalidRequest*3 | 403: AccessDenied*2 | 403: SignatureDoesNotMatch*1 |
x-amz-checksum-algorithm | 403: AccessDenied*2 | 403: AccessDenied*2 | 400: InvalidRequest*4 | 403: SignatureDoesNotMatch*1 |
Both | 403: AccessDenied*2 | 403: AccessDenied*2 | 403: AccessDenied*2 | 400: InvalidRequest*3 |
*1: The request signature we calculated does not match the signature you provided. Check your key and signing method. *2: There were headers present in the request which were not signed *3: Checksum Type mismatch occurred, expected checksum Type: null, actual checksum Type: crc32c *4: x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found.
Hi @richardnpaul, thanks for following up here. Going back to your original snippet, you are using CRC32 and not CRC32C (from zlib import crc32
). It looks like there are not plans to support CRC32C in zlib
: https://github.com/madler/zlib/issues/981. Have you tried any alternatives that support CRC32C?
Hi Tim,
Okay, so yes, as noted in my initial notes yes, we use the crc32c package, but we're just trying to test that the checksums work so it doesn't matter which one we use apart from it should be valid.
I've taken your code and made a couple of changes, I've added aws_access_key_id
etc. to the s3_client instantiation, I changed the bucket name, object key and the testfile variables and otherwise I didn't change anything else......and I got an error Failed to upload part 1, status: 403, response: <?xml version="1.0" encoding="UTF-8"?>
which was because I got a SignatureDoesNotMatch ... The request signature we calculated does not match the signature you provided. Check your key and signing method.
response.
I had the bucket deployed in eu-west-2
so I tried to create a bucket in another region, eu-west-1
to see if the issue persisted. After thinking that it did persisit, and working through some issues, I changed all the region references in my .aws/config
file to eu-west-1
as they were set to eu-west-2
and we have success...but not in the region that I'm trying to use :disappointed:
_(I realised shortly after that I could have just added region_name = "eu-west-1
to the s3 client so that I didn't have to change my config file :facepalm:)_
So, at this point I'm not sure if this is a botocore/boto3 issue or an AWS infrastructure issue :thinking: (...or something else)
Just some additional information, adding explicit v4 signature_version via botocore.config
results in the same error in both eu-west-1
and eu-west-2
:
from botocore.config import Config
my_config = Config(signature_version = 'v4')
s3_client = boto3.client('s3', config=my_config)
Describe the bug
When trying to upload a large object to S3 using the multipart upload process with presigned urls with crc32c checksums the response from S3 is a 400 error with an error message.
Expected Behavior
I would expect that the provided checksum headers would be expected and so the type would be the checksum type not a type of null which would then mean that the upload to S3 would succeed.
Current Behavior
The following type of error message is returned instead of success:
Reproduction Steps
Change all the AWS credentials for valid values for your testing and provide a file on the
testfile
assignment line (I was using a path in~/Downloads/
)Possible Solution
I feel like the checksum header is not being passed to be included in the signing process but to be honest I got a bit lost in the library's code and couldn't make head nor tail of it in the end.
Additional Information/Context
Docs page for generating the urls: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/generate_presigned_url.html Docs page with acceptable params to be passed to
generate_presigned_url
when usingupload_part
as theClientMethod
: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_part.htmlSDK version used
1.34.138
Environment details (OS name and version, etc.)
Ubuntu 22.04.4, Python 3.10.12