generate files (e.g., documentation) from a source in CI
sync the output to S3.
Because the files are generated in CI, they always have the current timestamp, so all of them will be synced to S3, even though only few of them may have changed. --size-only is not a viable alternative, as it skips, for example, typo fixes that often don't change the file size.
Proposed Solution
Add a flag --checksum-only or the like. If that flag is present, rather than retrieving file timestamp and size from S3 and comparing it to the local values, retrieve the checksum that was stored during the previous upload using --checksum-algorithm and compare it to the locally computed value.
Other Information
Current behavior:
$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt
upload: testdata/foo.txt to s3://my-bucket/foo.txt
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64
$ touch testdata/hoge/yq_linux_amd64
$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64
Desired behavior:
$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt
upload: testdata/foo.txt to s3://my-bucket/foo.txt
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64
$ touch testdata/hoge/yq_linux_amd64
$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
(no output)
Acknowledgements
[ ] I may be able to implement this feature request
Describe the feature
This is the same as https://github.com/aws/aws-cli/issues/8377 and https://github.com/aws/aws-cli/issues/7011:
aws s3 sync
should be able to detect files that need synchronization based on the checksum stored in S3, not size/timestamp.Both issues were closed in favor of https://github.com/aws/aws-cli/issues/6750 but https://github.com/aws/aws-cli/issues/6750 only goes half the way: it uploads the checksum as metadata, but doesn't take it into account when computing the sync candidates.
Use Case
A prototypical use case looks like
Because the files are generated in CI, they always have the current timestamp, so all of them will be synced to S3, even though only few of them may have changed.
--size-only
is not a viable alternative, as it skips, for example, typo fixes that often don't change the file size.Proposed Solution
Add a flag
--checksum-only
or the like. If that flag is present, rather than retrieving file timestamp and size from S3 and comparing it to the local values, retrieve the checksum that was stored during the previous upload using--checksum-algorithm
and compare it to the locally computed value.Other Information
Current behavior:
Desired behavior:
Acknowledgements
CLI version used
aws-cli/2.21.0
Environment details (OS name and version, etc.)
exe/x86_64.opensuse-tumbleweed.20241107