The very first created metadata.json , avro and snapshot avro files for my iceberg table on Athena and glue catalog were deleted because i have a TTL on my s3 bucket , i still have all other more recent files in metadata/ and data/ folders , but i cannot use the table anymore in anyway , which is weird since it is supposed to at least either update one single metadata file or at least use the last generated metadata or even be able to refresh the metadata files , loosing only the very oldest file breaks the whole table is really bad , here is more explanation below :
Athena query execution on the Iceberg table , for now the error i get is :
GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: VW53B9PKC8FVD3G6; S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg==; Proxy: null), S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg== (Bucket: athena-xxx-output-stage, Key: my_athena_path_xxxxxx/metadata/b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro) This query ran against the ‘xxxx’ database, unless qualified by the query. Please post the error message on our [forum ](https://forums.aws.amazon.com/forum.jspa?forumID=242&start=0) or contact [customer support ](https://eu-west-1.console.aws.amazon.com/support/home?#/case/create?issueType=technical&serviceCode=amazon-athena&categoryCode=query-related-issue) with Query ID: 53761fa2-b802-4417-b0c0-983a49686816
the missing file b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro was deleted by s3 ttl , and is very old , though i still have more recent avro files , but it seems the recent metadata.json still points to that file in the snapshots list in the manifest-list for the oldest snapshot with sequence-number= 1 , see below :
"snapshots" : [ { "sequence-number" : 1, "snapshot-id" : 95661809200085951, "timestamp-ms" : 1713142581530, "summary" : { "operation" : "append", "trino_query_id" : "20240415_005553_00056_gf7ew", "added-data-files" : "27", "added-records" : "5049", "added-files-size" : "330610", "changed-partition-count" : "1", "total-records" : "5049", "total-files-size" : "330610", "total-data-files" : "27", "total-delete-files" : "0", "total-position-deletes" : "0", "total-equality-deletes" : "0" }, "manifest-list" : "s3://athena-xxxx-output-stage/athena_output/config_xxxx/metadata/snap-95661809200085951-1-b6f6cbf8-774e-4161-8568-6b3e43ac6920.avro", "schema-id" : 0
I tried setting the propperties for vacuum this way :
`ALTER TABLE iceberg_table SET TBLPROPERTIES (
'vacuum_max_snapshot_age_seconds'='xxxxx'
)
VACUUM iceberg_table`
and that removed that reference from the newest metadata file , but i think this new file still references now the oldest metadatta.json file in this list :
"metadata-log" : [ { "timestamp-ms" : 1713142581530, "metadata-file : " : "s3://athena-......
}, {
"timestamp-ms" : 1713148787530,
"metadata-file : " : "s3://athena-...... }, { "timestamp-ms" : 1713187881530, "metadata-file : " : "s3://athena-......}, {
Also tried REFRESH operation even using iceberg api .. still same very first error of missing avro file
I would really appreciate if its posssible to help here .
I had this problem on prod and now the team is considering dropping the usage of iceberg and use standard athena tables
S3 had TTL for gdpr reasons
### PS : I managed to reproduce the error onn another table by simply deleting the oldest metada.json , .avro and snapshot.avro files
Apache Iceberg version
None
Query engine
Athena
Please describe the bug 🐞
The very first created metadata.json , avro and snapshot avro files for my iceberg table on Athena and glue catalog were deleted because i have a TTL on my s3 bucket , i still have all other more recent files in metadata/ and data/ folders , but i cannot use the table anymore in anyway , which is weird since it is supposed to at least either update one single metadata file or at least use the last generated metadata or even be able to refresh the metadata files , loosing only the very oldest file breaks the whole table is really bad , here is more explanation below : Athena query execution on the Iceberg table , for now the error i get is :
GENERIC_INTERNAL_ERROR: io.trino.hdfs.s3.TrinoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: VW53B9PKC8FVD3G6; S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg==; Proxy: null), S3 Extended Request ID: C0aljha+rUMGEYZQ/oA5QVF3/Ggsg17YTuEDQOFUabWcJGxjEXb0vZ9zMcqNwml/GOy7Ka8D4UDwU5lrqBDKTg== (Bucket: athena-xxx-output-stage, Key: my_athena_path_xxxxxx/metadata/b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro) This query ran against the ‘xxxx’ database, unless qualified by the query. Please post the error message on our [forum ](https://forums.aws.amazon.com/forum.jspa?forumID=242&start=0) or contact [customer support ](https://eu-west-1.console.aws.amazon.com/support/home?#/case/create?issueType=technical&serviceCode=amazon-athena&categoryCode=query-related-issue) with Query ID: 53761fa2-b802-4417-b0c0-983a49686816
the missing file b6f6cbf8-774e-4161-8568-6b3e43ac6920-m0.avro was deleted by s3 ttl , and is very old , though i still have more recent avro files , but it seems the recent metadata.json still points to that file in the snapshots list in the manifest-list for the oldest snapshot with sequence-number= 1 , see below :
"snapshots" : [ { "sequence-number" : 1, "snapshot-id" : 95661809200085951, "timestamp-ms" : 1713142581530, "summary" : { "operation" : "append", "trino_query_id" : "20240415_005553_00056_gf7ew", "added-data-files" : "27", "added-records" : "5049", "added-files-size" : "330610", "changed-partition-count" : "1", "total-records" : "5049", "total-files-size" : "330610", "total-data-files" : "27", "total-delete-files" : "0", "total-position-deletes" : "0", "total-equality-deletes" : "0" }, "manifest-list" : "s3://athena-xxxx-output-stage/athena_output/config_xxxx/metadata/snap-95661809200085951-1-b6f6cbf8-774e-4161-8568-6b3e43ac6920.avro", "schema-id" : 0
I tried setting the propperties for vacuum this way :
and that removed that reference from the newest metadata file , but i think this new file still references now the oldest metadatta.json file in this list : "metadata-log" : [ { "timestamp-ms" : 1713142581530, "metadata-file : " : "s3://athena-...... }, { "timestamp-ms" : 1713148787530, "metadata-file : " : "s3://athena-...... }, { "timestamp-ms" : 1713187881530, "metadata-file : " : "s3://athena-......}, {
Also tried REFRESH operation even using iceberg api .. still same very first error of missing avro file
I would really appreciate if its posssible to help here .
I had this problem on prod and now the team is considering dropping the usage of iceberg and use standard athena tables
S3 had TTL for gdpr reasons
### PS : I managed to reproduce the error onn another table by simply deleting the oldest metada.json , .avro and snapshot.avro files