RemoveOrphanFiles is working with only hadoop FS/IO and when run from local with hadoop catalog. when i try to run it for S3 files using glue catalog and from EMR. It throws the below error. i have tried with both iceberg 11,12 and also spark 3.0.1, spark 3.1.1 (all combinations) and also tried both the commands from Actions API and also from Spark Actions API. the result does not change.
Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute();
or
SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute();
and the error is
21/08/31 05:40:36 ERROR RemoveOrphanFilesMaintenanceJob: Error in RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp, Illegal Arguments in table properties - Can't parse null value from table properties, tenant: tenantId1, table: lakehouse_database.mobiletest1, removeOrphanFilesOlderThan: 1630388136606, Status: Failed, Reason: {}.
java.lang.IllegalArgumentException: Cannot find the metadata table for glue_catalog.lakehouse_database.mobiletest1 of type ALL_MANIFESTS
at org.apache.iceberg.spark.SparkTableUtil.loadMetadataTable(SparkTableUtil.java:634)
at org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:153)
at org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:119)
at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154)
at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:99)
at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141)
at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76)
at org.apache.iceberg.actions.RemoveOrphanFilesAction.execute(RemoveOrphanFilesAction.java:87)
at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:273)
at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133)
at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58)
at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:136)
at com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76)
at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:236)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
Is it something to do with my implementation or is it a bug with an iceberg? or am i missing something her? please help !
Hi,
RemoveOrphanFiles is working with only hadoop FS/IO and when run from local with hadoop catalog. when i try to run it for S3 files using glue catalog and from EMR. It throws the below error. i have tried with both iceberg 11,12 and also spark 3.0.1, spark 3.1.1 (all combinations) and also tried both the commands from Actions API and also from Spark Actions API. the result does not change.
Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute(); or SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute();
and the error is
21/08/31 05:40:36 ERROR RemoveOrphanFilesMaintenanceJob: Error in RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp, Illegal Arguments in table properties - Can't parse null value from table properties, tenant: tenantId1, table: lakehouse_database.mobiletest1, removeOrphanFilesOlderThan: 1630388136606, Status: Failed, Reason: {}. java.lang.IllegalArgumentException: Cannot find the metadata table for glue_catalog.lakehouse_database.mobiletest1 of type ALL_MANIFESTS at org.apache.iceberg.spark.SparkTableUtil.loadMetadataTable(SparkTableUtil.java:634) at org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:153) at org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:119) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154) at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:99) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76) at org.apache.iceberg.actions.RemoveOrphanFilesAction.execute(RemoveOrphanFilesAction.java:87) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:273) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:136) at com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:236) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)
Is it something to do with my implementation or is it a bug with an iceberg? or am i missing something her? please help !
Thanks, Raghu