Closed m-radzikowski closed 6 days ago
Hi @m-radzikowski thanks for the feature request. There has already been some discussion on the team about how these checksums could enhance commands like aws s3 cp
and aws s3 sync
. But it will take more time and discussion to think through the implementation. In the meantime we can leave this issue open to track the request.
This would be a useful addition to the high-level commands. For reference here is a solution using s3api: Would have been nice if MD5 digests were included as an option.
# aws-cli version 2.7.16
# https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/
# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body <file_name> --checksum-algorithm crc32 --bucket <bucket_name> --key <key_name>
# retrieve the checksum```
# ChecksumCRC32 ChecksumCRC32C ChecksumSHA1 ChecksumSHA256
aws s3api head-object --bucket <bucket_name> --key <key_name> --checksum-mode Enabled --Query ChecksumCRC32 --output text
+1 to support for checksums when syncing.
+1 on this feature.
+1
I've got a client migrating a small but critical dataset to S3, and they have strict requirements for data integrity validation. With checksum support missing from the S3 sync higher-level command, we expect an increased effort to meet the client's requirements. This is a significant gap as far as missing functionality goes.
+1
+1
+1
+1
Works fine for me with the latest aws cli (2.12.16). Note that the checksum in base64 encoded as detailed here: https://aws.amazon.com/getting-started/hands-on/amazon-s3-with-additional-checksums/?ref=docs_gateway/amazons3/checking-object-integrity.html
BUCKET=my-test-bucket
KEY=hello_checksum.txt
echo "Hello world!" > hello.txt
# Compute base64 encoded sha256
shasum -a 256 hello.txt|cut -f1 -d\ |xxd -r -p|base64
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=
# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body hello.txt --checksum-algorithm sha256 --bucket
${BUCKET} --key ${KEY}
# retrieve the checksum
aws s3api head-object --bucket ${BUCKET} --key ${KEY} --checksum-mode
Enabled --query ChecksumSHA256 --output text
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=
On Mon, Jul 3, 2023 at 3:02 PM Sarthak Jain @.***> wrote:
@rajivnarayan https://github.com/rajivnarayan i stumbled upon this https://github.com/aws/aws-cli/issues/6750#issuecomment-1195959947. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cli/issues/6750#issuecomment-1619016582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXS2BUKAMDHSLJUEXCHLPDXOMJLTANCNFSM5PMLKCAQ . You are receiving this because you were mentioned.Message ID: @.***>
But it will take more time and discussion to think through the implementation.
@tim-finnigan could you perhaps elaborate on what the key problems are with adding checksum support to the s3
commands? As it is supported by the low-level s3api
commands I'd expect that support in the high-level commands is straight forward. Other libraries such as boto3 support s3 based checksum computation in their high level API functions (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS).
I believe that most of use-cases are probably using high-level s3 command s3 cp or sync. can we have more information to think through the implementation?
I do support the changes in high level implementation aws s3 sync command however this feature should be disabled temporary when it is not being fixed at the moment. We have no idea when will this "new" feature exist (the thread had been 1 year plus) but the "sync" command is misleading the user that they have "sychronized" the files while it is not always the case. It may caused the financial lost to the company if the "wrong" object had been synchronized. I am forced to do the workaround to fix this aws s3 sync issue to ensure the "different md5sum with same file size" file being uploaded (skipped using aws s3 sync at the moment).
May I have the update on the aws sync bug issue? This is causing a lot of inconvenience to sync the file(s) from AWS S3 now.
This feature has been released into version 2.18.0. Closing issue.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
Is your feature request related to a problem? Please describe.
Newly released additional S3 checksums feature enhances the SDKs operations by calculating selected checksum value on file upload. This also includes multipart upload. However, this new feature is not present in the high-level S3 commands.
Describe the solution you'd like
--checksum-algorithm
parameter in theaws s3
commands, especially in theaws s3 cp
.Describe alternatives you've considered
Using low-level commands.