aldy120 / s3-note

Note for Amazon S3
0 stars 0 forks source link

Existing object replication #41

Open aldy120 opened 1 year ago

aldy120 commented 1 year ago

https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-batch-replication-batch.html 只有用 replication 並且是新的 destination 的時候

Consideration

實際的測試情況

aldy120 commented 1 year ago

我發現,我刪除並重建相同的 Replication rule ,使用同樣的 destination bucket 來進行 existing objects replication ,雖然顯示成功但在第二次的時候會無法有效地複製。改為使用之前未曾在此 source bucket replication rule 使用過的新 bucket 作為 destination 就可以成功複製。

從相關的文檔,可以看到,對於同一個 destination bucket 重複的複製是不支援的。

Batch Replication does not support re-replicating objects that were deleted with the version ID of the object from the destination bucket.

Replicating existing objects with S3 Batch Replication - S3 Batch Replication considerations - https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-batch-replication-batch.html#batch-replication-considerations

aldy120 commented 1 year ago

可以用跨帳戶的方式

  1. 先用 replicatino rule 來生成一份 manifest
  2. 使用這份 manifest 來做 batch opeation

要小心, manifest 裡面的 bucket name 跟外面的 checksum 要修改,不然會發現 batch operation 無法讀取,會有 access denied 的類似錯誤訊息。

步驟如下:

  1. 看看 manifest.json 裡面的 data 位置。建立目錄,把 gz 放到該目錄下。
  2. 搬移 manifest.json, manifest.checksum 到 bucket 裡面
  3. 添加 role policy (根據 blog post)
    {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBatchOperationsDestinationObjectCOPY",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectVersionAcl",
                "s3:PutObjectAcl",
                "s3:PutObjectVersionTagging",
                "s3:PutObjectTagging",
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectTagging",
                "s3:GetObjectVersionAcl",
                "s3:GetObjectVersionTagging"
            ],
            "Resource": [
                "arn:aws:s3:::<DESTINATION_BUCKET>/*",
                "arn:aws:s3:::<SOURCE_BUCKET>/*",
                "arn:aws:s3:::<DESTINATION_INVENTORY>/*",
                "arn:aws:s3:::<BATCH_REPORT_DESTINATION>/*"
            ]
        }
    ]
    }

這是 create job 會出現錯誤: Reading the manifest is forbidden: AccessDenied

  1. 修改兩個地方 manifest.json 裡面的 destinationBucket 改成 inventory 搬到的 bucket manifest.checksum 是 md5(manifest.json) 或是 etag 。

  2. 修改 source bucket policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowBatchOperationsSourceObjectCOPY",
            "Effect": "Allow",
            "Principal": {
          "AWS": "arn:aws:iam::DestinationAccountNumber:role/BatchOperationsDestinationRoleCOPY"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:GetObjectAcl",
                "s3:GetObjectTagging",
                "s3:GetObjectVersionAcl",
                "s3:GetObjectVersionTagging"
            ],
            "Resource": "arn:aws:s3:::ObjectSourceBucket/*"
        }
    ]
}

要記得 Run job