aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.58k stars 4.14k forks source link

Support S3 additional checksums in high-level S3 commands #6750

Closed m-radzikowski closed 6 days ago

m-radzikowski commented 2 years ago

Is your feature request related to a problem? Please describe.

Newly released additional S3 checksums feature enhances the SDKs operations by calculating selected checksum value on file upload. This also includes multipart upload. However, this new feature is not present in the high-level S3 commands.

Describe the solution you'd like

--checksum-algorithm parameter in the aws s3 commands, especially in the aws s3 cp.

Describe alternatives you've considered

Using low-level commands.

tim-finnigan commented 2 years ago

Hi @m-radzikowski thanks for the feature request. There has already been some discussion on the team about how these checksums could enhance commands like aws s3 cp and aws s3 sync. But it will take more time and discussion to think through the implementation. In the meantime we can leave this issue open to track the request.

rajivnarayan commented 2 years ago

This would be a useful addition to the high-level commands. For reference here is a solution using s3api: Would have been nice if MD5 digests were included as an option.

# aws-cli version 2.7.16
# https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/

# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body <file_name> --checksum-algorithm crc32 --bucket <bucket_name> --key <key_name>

# retrieve the checksum```
# ChecksumCRC32 ChecksumCRC32C ChecksumSHA1 ChecksumSHA256
aws s3api head-object --bucket <bucket_name> --key <key_name> --checksum-mode Enabled --Query ChecksumCRC32 --output text
jonathansampson commented 2 years ago

+1 to support for checksums when syncing.

saksham commented 1 year ago

+1 on this feature.

genvidkyle commented 1 year ago

+1

jbutz commented 1 year ago

I've got a client migrating a small but critical dataset to S3, and they have strict requirements for data integrity validation. With checksum support missing from the S3 sync higher-level command, we expect an increased effort to meet the client's requirements. This is a significant gap as far as missing functionality goes.

ashepherd commented 1 year ago

+1

MaksymSimchuk-prxt commented 1 year ago

+1

khilnani commented 1 year ago

+1

animeshsg commented 1 year ago

+1

sarthakjain271095 commented 1 year ago

@rajivnarayan i stumbled upon this. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?

rajivnarayan commented 1 year ago

Works fine for me with the latest aws cli (2.12.16). Note that the checksum in base64 encoded as detailed here: https://aws.amazon.com/getting-started/hands-on/amazon-s3-with-additional-checksums/?ref=docs_gateway/amazons3/checking-object-integrity.html

BUCKET=my-test-bucket
KEY=hello_checksum.txt
echo "Hello world!" > hello.txt

# Compute base64 encoded sha256
shasum -a 256 hello.txt|cut -f1 -d\ |xxd -r -p|base64
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=

# compute and save checksum on upload / copy
# algorithms supported: crc32 crc32c sha1 sha256
aws s3api put-object --body hello.txt --checksum-algorithm sha256 --bucket
${BUCKET} --key ${KEY}

# retrieve the checksum
aws s3api head-object --bucket ${BUCKET} --key ${KEY} --checksum-mode
Enabled --query ChecksumSHA256 --output text
# C6kE6uh3O3DHUzPbTeLzrEWorU3bobJC8LPPwZk5Hdg=

On Mon, Jul 3, 2023 at 3:02 PM Sarthak Jain @.***> wrote:

@rajivnarayan https://github.com/rajivnarayan i stumbled upon this https://github.com/aws/aws-cli/issues/6750#issuecomment-1195959947. The sha256 checksum value being returned from aws doesn't seem to be right. Additionally, as per the cli doc https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object.html, --checksum-algorithm param is only supported when using SDK. Have you faced issues about the sha256 value not being rightly calculated?

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cli/issues/6750#issuecomment-1619016582, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXS2BUKAMDHSLJUEXCHLPDXOMJLTANCNFSM5PMLKCAQ . You are receiving this because you were mentioned.Message ID: @.***>

dpeger commented 1 year ago

But it will take more time and discussion to think through the implementation.

@tim-finnigan could you perhaps elaborate on what the key problems are with adding checksum support to the s3 commands? As it is supported by the low-level s3api commands I'd expect that support in the high-level commands is straight forward. Other libraries such as boto3 support s3 based checksum computation in their high level API functions (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS).

Park-minkyu commented 1 year ago

I believe that most of use-cases are probably using high-level s3 command s3 cp or sync. can we have more information to think through the implementation?

YoongLoong commented 11 months ago

I do support the changes in high level implementation aws s3 sync command however this feature should be disabled temporary when it is not being fixed at the moment. We have no idea when will this "new" feature exist (the thread had been 1 year plus) but the "sync" command is misleading the user that they have "sychronized" the files while it is not always the case. It may caused the financial lost to the company if the "wrong" object had been synchronized. I am forced to do the workaround to fix this aws s3 sync issue to ensure the "different md5sum with same file size" file being uploaded (skipped using aws s3 sync at the moment).

YoongLoong commented 9 months ago

May I have the update on the aws sync bug issue? This is causing a lot of inconvenience to sync the file(s) from AWS S3 now.

aemous commented 6 days ago

This feature has been released into version 2.18.0. Closing issue.

github-actions[bot] commented 6 days ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.