When creating an Iceberg table on OSS and executing clean-orphan-file, there are normal data_files (just created but not yet committed to Iceberg) being cleaned up. The clean-orphan-file.min-existing-time-minutes parameter is not taking effect.
Affects Versions
master
What table format are you seeing the problem on?
Iceberg
What engines are you seeing the problem on?
AMS
How to reproduce
Create an Iceberg table using OSS, then set clean-orphan-file.enabled to true.
Relevant log output
org.apache.iceberg.exceptions.NotFoundException: File does not exist: oss://xxxxxx/user/hive/warehouse/dev_game_ods.db/xxxxxx_log/data/event_time.string_trunc=2024-06-12/log_type.string=xxxxxxxxxx/00000-0-9aa7ee01-e2ca-46bd-8884-e144c8d1528b-45994.parquet
at org.apache.iceberg.hadoop.HadoopInputFile.lazyStat(HadoopInputFile.java:164)
at org.apache.iceberg.hadoop.HadoopInputFile.getStat(HadoopInputFile.java:200)
at org.apache.iceberg.parquet.ParquetIO.file(ParquetIO.java:51)
at org.apache.iceberg.parquet.ReadConf.newReader(ReadConf.java:238)
at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:81)
at org.apache.iceberg.parquet.ParquetReader.init(ParquetReader.java:71)
at org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:91)
at org.apache.iceberg.io.CloseableIterable$ConcatCloseableIterable$ConcatCloseableIterator.hasNext(CloseableIterable.java:257)
at org.apache.iceberg.io.CloseableIterable$7$1.hasNext(CloseableIterable.java:197)
at org.apache.iceberg.io.CloseableIterable$7$1.hasNext(CloseableIterable.java:197)
at org.apache.amoro.optimizing.AbstractRewriteFilesExecutor.rewriterDataFiles(AbstractRewriteFilesExecutor.java:150)
at org.apache.amoro.table.TableMetaStore.call(TableMetaStore.java:234)
at org.apache.amoro.table.TableMetaStore.lambda$doAs$0(TableMetaStore.java:209)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1855)
at org.apache.amoro.table.TableMetaStore.doAs(TableMetaStore.java:209)
at org.apache.amoro.io.AuthenticatedHadoopFileIO.doAs(AuthenticatedHadoopFileIO.java:202)
at org.apache.amoro.optimizing.AbstractRewriteFilesExecutor.execute(AbstractRewriteFilesExecutor.java:108)
at org.apache.amoro.optimizing.AbstractRewriteFilesExecutor.execute(AbstractRewriteFilesExecutor.java:64)
at org.apache.amoro.optimizer.common.OptimizerExecutor.executeTask(OptimizerExecutor.java:149)
at org.apache.amoro.optimizer.spark.SparkOptimizingTaskFunction.call(SparkOptimizingTaskFunction.java:45)
at org.apache.amoro.optimizer.spark.SparkOptimizingTaskFunction.call(SparkOptimizingTaskFunction.java:33)
at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
Anything else
No response
Are you willing to submit a PR?
[X] Yes I am willing to submit a PR!
Code of Conduct
[X] I agree to follow this project's Code of Conduct
What happened?
When creating an Iceberg table on OSS and executing clean-orphan-file, there are normal data_files (just created but not yet committed to Iceberg) being cleaned up. The clean-orphan-file.min-existing-time-minutes parameter is not taking effect.
Affects Versions
master
What table format are you seeing the problem on?
Iceberg
What engines are you seeing the problem on?
AMS
How to reproduce
Create an Iceberg table using OSS, then set clean-orphan-file.enabled to true.
Relevant log output
Anything else
No response
Are you willing to submit a PR?
Code of Conduct