Open shaeqahmed opened 10 months ago
@shaeqahmed are you seeing this log line too ?
LOG.warn(
"Received unexpected failure when committing to {}, validating if commit ended up succeeding.",
fullTableName,
persistFailure);
I stumbled upon similar error : [1] If we can't see this log line, so to me seems like Retry detector didn't work as expected and it didn't attempt to reconcile the status [2] If we see the log line this means we saw inconsistent state even after reconciliation, which will be a bit tricky as i am not sure how glue works in this case.
Apache Iceberg version
1.4.2 (latest release)
Query engine
None
Please describe the bug 🐞
Similar issue that i found that was supposed to be fixed in older version: https://github.com/apache/iceberg/issues/7151
We have a Java Iceberg Code that processes from a FIFO queue and does commits to Iceberg in single threaded fashion. I have confirmed that we are not making commits anywhere to a table at the same time. However, when doing a few commits back to back in a row, at some point we encountered the following WARN log indicating that Glue detected a concurrent update, and it was retrying:
But immediately after this log, while attempting to refresh the Iceberg metadata there is a iceberg NotFoundException as the current metadata location doesn't exist or no longer exists.
This has resulted in our table becoming corrupt and the availability of our data lake service being effected until we manually fixed the table by refrencing the Glue
previous_metadata_location
and overriding the invalid currentmetadata_location
with it.It looks to me that when experiencing a CommitFailedException (CFE) these are retried internally and in any case should not result in a corrupt table even if all tried fail. Our code looks as follows, as we catch all exceptions:
Is this a bug in the Glue Iceberg code, or how should we protect ourselves from a situation where the Iceberg table is left pointing to an invalid location because of failed commits due to concurrent modifications thrown by Glue?