aldy120 / s3-note

Note for Amazon S3
0 stars 0 forks source link

Analyze S3 inventory report with Athena #22

Open aldy120 opened 2 years ago

aldy120 commented 2 years ago

參考 https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html 發現不太好用,最後修了一個自己可以用的版本。

Create table

Inventory report with csv format, versioning disabled.

CREATE EXTERNAL TABLE inventory(
bucket string,
key string,
size bigint,
last_modified_date string,
e_tag string,
storage_class string,
is_multipart_uploaded boolean,
replication_status string,
encryption_status string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://logs-dub-12345678/test-dub-12345678/inventory/hive/'

Partition

MSCK REPAIR TABLE inventory;

Error

有機會出現錯誤 HIVE_BAD_DATA 通常是因為格式跟上面建表的指定資料型別不同。

統計目錄中的檔案總大小

SELECT sum(size) as size FROM inventory WHERE dt = '2021-07-15-00-00' and key like 'images/%'