aws / aws-cli

Universal Command Line Interface for Amazon Web Services
Other
15.6k stars 4.14k forks source link

sync changed files to S3 based on checksum, if present #9074

Open tgpfeiffer opened 1 week ago

tgpfeiffer commented 1 week ago

Describe the feature

This is the same as https://github.com/aws/aws-cli/issues/8377 and https://github.com/aws/aws-cli/issues/7011: aws s3 sync should be able to detect files that need synchronization based on the checksum stored in S3, not size/timestamp.

Both issues were closed in favor of https://github.com/aws/aws-cli/issues/6750 but https://github.com/aws/aws-cli/issues/6750 only goes half the way: it uploads the checksum as metadata, but doesn't take it into account when computing the sync candidates.

Use Case

A prototypical use case looks like

Because the files are generated in CI, they always have the current timestamp, so all of them will be synced to S3, even though only few of them may have changed. --size-only is not a viable alternative, as it skips, for example, typo fixes that often don't change the file size.

Proposed Solution

Add a flag --checksum-only or the like. If that flag is present, rather than retrieving file timestamp and size from S3 and comparing it to the local values, retrieve the checksum that was stored during the previous upload using --checksum-algorithm and compare it to the locally computed value.

Other Information

Current behavior:

$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt     
upload: testdata/foo.txt to s3://my-bucket/foo.txt      
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

$ touch testdata/hoge/yq_linux_amd64

$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

Desired behavior:

$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt     
upload: testdata/foo.txt to s3://my-bucket/foo.txt      
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

$ touch testdata/hoge/yq_linux_amd64

$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
(no output)

Acknowledgements

CLI version used

aws-cli/2.21.0

Environment details (OS name and version, etc.)

exe/x86_64.opensuse-tumbleweed.20241107