Closed coolderli closed 2 years ago
@coolderli is my understanding below correct?
Regarding getMaxCommittedCheckpointId
, I guess you are saying that the snapshot containing the last commited checkpointId expired and hence getMaxCommittedCheckpointId
returns -1. I agree that in this case we shouldn't try to execute commitUpToCheckpoint
since we don't safely know what is the last committed checkpoint id.
It can result in duplicates in this case if the manifest file wasn't cleaned up after successful commit. Because the checkpointed manifest file was deleted, we can also conclude that the last Iceberg commit succeeded already.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
this problem has been solved? i also meet this problem, when iceberg was commited sucessfully but flink flush snapshot state to state backend was failed, then i restart task, it can be failed: FileNotFoundException: File does not exist
My Flink Job Failed to restore from the checkpoint, and throw the exception as below:
After the verification, I found the downstream table has changed. In the current implementation, we query the history of snapshots to find the max committed checkpoint-id: https://github.com/apache/iceberg/blob/master/flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java#L369 If the downstream table has changed, the value of
getMaxCommittedCheckpointId
is unpredictable. I think we can store the table UUID on the checkpoint. When restoring, we can use the table UUID to validate the downstream, throw an exception if the UUIDs are inconsistent. What do you think about this? @stevenzwu @rdblue