aldy120 / s3-note

Note for Amazon S3
0 stars 0 forks source link

Checksum #34

Open aldy120 opened 1 year ago

aldy120 commented 1 year ago
echo hi > test1.txt
echo -n hi > test2.txt

-n 可以避免最後的換行。

Screen Shot 2022-12-14 at 12 20 20
aldy120 commented 1 year ago

SHA256

openssl dgst -sha256 test1.txt 
SHA256(test1.txt)= 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
openssl dgst -sha256 test2.txt 
SHA256(test2.txt)= 8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4

Convert to base64

echo "98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4" | xxd -r -p | base64
mOpuTyFvL7S2n/+bOkSELDhobKaF8/VdxIxdP7EQe+Q=
echo "8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4" | xxd -r -p | base64
j0NDRmSPa5bfid2pAcUXaxCm2Dlh3TwayItZstwyeqQ=
aldy120 commented 1 year ago

SHA1

openssl dgst -sha1 test1.txt
SHA1(test1.txt)= 55ca6286e3e4f4fba5d0448333fa99fc5a404a73
openssl dgst -sha1 test2.txt
SHA1(test2.txt)= c22b5f9178342609428d6f51b2c5af4c0bde6a42

Convert to base64

echo "55ca6286e3e4f4fba5d0448333fa99fc5a404a73" | xxd -r -p | base64
VcpihuPk9Pul0ESDM/qZ/FpASnM=
echo "c22b5f9178342609428d6f51b2c5af4c0bde6a42" | xxd -r -p | base64
witfkXg0JglCjW9RssWvTAveakI=
aldy120 commented 1 year ago

CRC32

crc32 test1.txt
ed6f7a7a
crc32 test2.txt
d8932aac

Convert to base64

echo "ed6f7a7a" | xxd -r -p | base64
7W96eg==
echo "d8932aac" | xxd -r -p | base64
2JMqrA==
aldy120 commented 1 year ago

CRC32C

Fin gsutil can generate CRC32C checksum.

sudo pip3 install gsutil
gsutil hash test1.txt
Hashes [base64] for test1.txt:B]                                                
        Hash (crc32c):          G9ywgw==
        Hash (md5):             dk76iD3aHhHbR2ccSju9ng==

Operation completed over 1 objects/3.0 B.                                        
gsutil hash test2.txt
Hashes [base64] for test2.txt:B]                                                
        Hash (crc32c):          9Z3Zwg==
        Hash (md5):             SfaKXIST7CwL9ImCHCH8Ow==

Operation completed over 1 objects/2.0 B.           
aldy120 commented 1 year ago

使用 head object 拿不到 checksum

aws s3api head-object --bucket test-dub-12345678 --key checksum/test1.txt
{
    "AcceptRanges": "bytes",
    "LastModified": "2022-12-14T15:34:37+00:00",
    "ContentLength": 3,
    "ETag": "\"764efa883dda1e11db47671c4a3bbd9e\"",
    "VersionId": "null",
    "ContentType": "text/plain",
    "Metadata": {}
}
aldy120 commented 1 year ago

MD5

openssl dgst -md5 test1.txt
MD5(test1.txt)= 764efa883dda1e11db47671c4a3bbd9e
openssl dgst -md5 test2.txt
MD5(test2.txt)= 49f68a5c8493ec2c0bf489821c21fc3b
aldy120 commented 1 year ago

CLI 要加 --checksum-mode ENABLED 才有顯示。

aws s3api head-object --bucket test-dub-12345678 --key checksum/test1.txt --checksum-mode ENABLED
{
    "AcceptRanges": "bytes",
    "LastModified": "2022-12-14T15:34:37+00:00",
    "ContentLength": 3,
    "ChecksumSHA256": "mOpuTyFvL7S2n/+bOkSELDhobKaF8/VdxIxdP7EQe+Q=",
    "ETag": "\"764efa883dda1e11db47671c4a3bbd9e\"",
    "VersionId": "null",
    "ContentType": "text/plain",
    "Metadata": {}
}

或是使用 get-object-attributes

aws s3api get-object-attributes --bucket test-dub-12345678 --key checksum/test1.txt --object-attributes "ETag" "Checksum" "ObjectParts" "StorageClass" "ObjectSize"
{
    "LastModified": "2022-12-14T15:34:37+00:00",
    "VersionId": "null",
    "ETag": "764efa883dda1e11db47671c4a3bbd9e",
    "Checksum": {
        "ChecksumSHA256": "mOpuTyFvL7S2n/+bOkSELDhobKaF8/VdxIxdP7EQe+Q="
    },
    "StorageClass": "STANDARD",
    "ObjectSize": 3
}
aws s3api get-object-attributes --bucket test-dub-12345678 --key temp_10GB_file --
object-attributes "ETag" "Checksum" "ObjectParts" "StorageClass" "ObjectSize"
{
    "LastModified": "2022-08-02T16:39:59+00:00",
    "VersionId": "null",
    "ETag": "21697dce4bbfa7556394cff44fc885ae-640",
    "ObjectParts": {
        "TotalPartsCount": 640
    },
    "StorageClass": "STANDARD",
    "ObjectSize": 10737418240
}
aldy120 commented 1 year ago

這幾個 request header 都可以在上傳時帶,可以讓 S3 server 驗證。

Content-MD5
x-amz-checksum-crc32: ChecksumCRC32
x-amz-checksum-crc32c: ChecksumCRC32C
x-amz-checksum-sha1: ChecksumSHA1
x-amz-checksum-sha256: ChecksumSHA256
aldy120 commented 1 year ago

Etag 非 multipart 上傳,且未使用 SSE-C 或是 KMS ,此時 etag 才會等於 md5 。

如果是用 multipart ,就會把每個 part 算出 md5 ,再把這些 md5 合在一起做一次 md5 ,最後加上 parts 數量作為 Etag。如果選用 additional checksum ,計算方式同理。

Amazon S3 calculates the MD5 digest of each individual part as it is uploaded. The MD5 digests are used to determine the ETag for the final object. Amazon S3 concatenates the bytes for the MD5 digests together and then calculates the MD5 digest of these concatenated values. The final step for creating the ETag is when Amazon S3 adds a dash with the total number of parts to the end.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

aldy120 commented 1 year ago

Console 在操作 16 MB 以上的 object 會自動用 multipart 處理,所有 etag/checksum 可能會跑掉

aldy120 commented 1 year ago

https://aws.amazon.com/blogs/media/building-scalable-checksums/