Open aldy120 opened 2 years ago
參考 https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html 發現不太好用,最後修了一個自己可以用的版本。
Inventory report with csv format, versioning disabled.
CREATE EXTERNAL TABLE inventory( bucket string, key string, size bigint, last_modified_date string, e_tag string, storage_class string, is_multipart_uploaded boolean, replication_status string, encryption_status string ) PARTITIONED BY (dt string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 's3://logs-dub-12345678/test-dub-12345678/inventory/hive/'
MSCK REPAIR TABLE inventory;
有機會出現錯誤 HIVE_BAD_DATA 通常是因為格式跟上面建表的指定資料型別不同。
HIVE_BAD_DATA
SELECT sum(size) as size FROM inventory WHERE dt = '2021-07-15-00-00' and key like 'images/%'
參考 https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html 發現不太好用,最後修了一個自己可以用的版本。
Create table
Inventory report with csv format, versioning disabled.
Partition
Error
有機會出現錯誤
HIVE_BAD_DATA
通常是因為格式跟上面建表的指定資料型別不同。統計目錄中的檔案總大小