Open coolacid opened 1 year ago
Digging.
When the parts are uploaded to S3, S3 will return the corresponding checksum. Ex (from a Debugging Print statement);
Line 2174: out={'ResponseMetadata': {'RequestId': '...', 'HostId': '...', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '...', 'x-amz-request-id': '...', 'date': 'Sun, 11 Dec 2022 18:16:14 GMT', 'etag': '"..."', 'x-amz-checksum-sha256': 'JYVhopFfH6xHs34sIajC/UdtzVojFMP1zktFGsGw8h0=', 'x-amz-server-side-encryption': 'AES256', 'server': 'AmazonS3', 'content-length': '0', 'connection': 'close'}, 'RetryAttempts': 0}, 'ServerSideEncryption': 'AES256', 'ETag': '"..."', 'ChecksumSHA256': 'JYVhopFfH6xHs34sIajC/UdtzVojFMP1zktFGsGw8h0='}
This checksum should be replied back in the CompleteMultipartUpload
call. See https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html
Doing a simple dirty test adding the ChecksumSHA256 here https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L2171 gets me a working system.
Thanks for looking into this. It sounds like you have a solution - should this be put into a PR?
As much as I'd like to PR this, my method wouldn't be as clean as the rest of the code.
It is worthwhile having something that works! Perhaps I can help make is fit with the style of the rest of the code, or else it can serve as a public workaround for those that need it.
When sending multi-part uploads with a S3 Integrity Checksum, it fails with an error indicating not all parts have the checksum enabled.
I was able to enable
ChecksumAlgorithm
by adding as3_additional_kwargs
to theS3FileSystem
initialisation. ex:When sending a larger file using Multi-Part Uploads, it yields the following trace.