Open Ytimetravel opened 2 months ago
The update to properties file should be atomic, and we already do that for HoodieTableConfig.modify
, but it just throws for writer if any exception happens, the reader would still work by reading the back_up file.
we need more information to ensure that the hoodie.properties file is correct, rather than directly skipping file processing and deleting the backup file.
+1 for this, we need to strenthen the handling of the properties file exception for the invoker.
@danny0405 My current understanding is as follows:
Can we check if original properties is error-free by comparing file sizes?
We have a check-sum in the properties file.
@danny0405 Sounds good. Can I optimize the decision-making process here?
Sure, would be glad to review your fix.
@Ytimetravel Did you got a chance to work on this? Do we have any JIRA for the same?
sorry, I am not sure if I fully understand how exactly we got into corrupt state.
From what I see createMetaClient(true) fails. But if we chase the chain of calls, its ends up with https://github.com/apache/hudi/blob/3a57591152065ddb317c5fe67bab8163730f1e73/hudi-common/src/main/java/org/apache/hudi/common/util/ConfigUtils.java#L541
which actually accounts for reading from either of back up or original property file.
can you help me understand a bit more.
Describe the problem you faced Dear community, Recently I discovered a case: a write failure can cause the hoodi.properties file corrupted. Problem site: It causes other write tasks to fail. The process in which this situation occurs is as follows:
Executing the commit will trigger the maybeDeleteMetadataTable process.(If need)
An exception occurred during the following process, causing the properties file write to fail.
File status:properties error(len=0) properties_backup error-free
Then it triggers rollback.
Since the table version cannot be correctly obtained at this point, it triggers an upgrade from 0 to 6.
File status:properties error(len=0) properties_backup removed
I think that we should not only check if the hoodie.properties file exists when performing recoverIfNeeded, we need more information to ensure that the hoodie.properties file is correct, rather than directly skipping file processing and deleting the backup file. Any suggestion?
Environment Description
Hudi version : 0.14.0
Spark version :2.4
Hadoop version :2.6
Storage (HDFS/S3/GCS..) :HDFS
Stacktrace Caused by: org.apache.hudi.exception.HoodieException: Error updating table configs. at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:91) at org.apache.hudi.internal.HoodieDataSourceInternalWriter.commit(HoodieDataSourceInternalWriter.java:91) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:76) ... 69 more Suppressed: java.lang.IllegalArgumentException: hoodie.table.name property needs to be specified at org.apache.hudi.common.table.HoodieTableConfig.generateChecksum(HoodieTableConfig.java:523) at org.apache.hudi.common.table.HoodieTableConfig.getOrderedPropertiesWithTableChecksum(HoodieTableConfig.java:321) at org.apache.hudi.common.table.HoodieTableConfig.storeProperties(HoodieTableConfig.java:339) at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:438) at org.apache.hudi.common.table.HoodieTableConfig.delete(HoodieTableConfig.java:481) at org.apache.hudi.table.upgrade.UpgradeDowngrade.run(UpgradeDowngrade.java:151) at org.apache.hudi.client.BaseHoodieWriteClient.tryUpgrade(BaseHoodieWriteClient.java:1399) at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1255) at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296) at org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:769) at org.apache.hudi.internal.DataSourceInternalWriterHelper.abort(DataSourceInternalWriterHelper.java:99) at org.apache.hudi.internal.HoodieDataSourceInternalWriter.abort(HoodieDataSourceInternalWriter.java:96) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:82) ... 69 more Caused by: org.apache.hudi.exception.HoodieIOException: Error updating table configs. at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:466) at org.apache.hudi.common.table.HoodieTableConfig.update(HoodieTableConfig.java:475) at org.apache.hudi.common.table.HoodieTableConfig.setMetadataPartitionState(HoodieTableConfig.java:816) at org.apache.hudi.common.table.HoodieTableConfig.clearMetadataPartitions(HoodieTableConfig.java:847) at org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataTable(HoodieTableMetadataUtil.java:1396) at org.apache.hudi.metadata.HoodieTableMetadataUtil.deleteMetadataTable(HoodieTableMetadataUtil.java:275) at org.apache.hudi.table.HoodieTable.maybeDeleteMetadataTable(HoodieTable.java:995) at org.apache.hudi.table.HoodieSparkTable.getMetadataWriter(HoodieSparkTable.java:116) at org.apache.hudi.table.HoodieTable.getMetadataWriter(HoodieTable.java:947) at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:359) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:285) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:211) at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:88) ... 71 more Caused by: java.io.InterruptedIOException: Interrupted while waiting for data to be acknowledged by pipeline at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:3520) at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:3498) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:3690) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:3625) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:80) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:115) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:80) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:115) at org.apache.hudi.common.fs.SizeAwareFSDataOutputStream.close(SizeAwareFSDataOutputStream.java:75) at org.apache.hudi.common.table.HoodieTableConfig.modify(HoodieTableConfig.java:449) ... 84 more