Open rad-pat opened 3 months ago
And just for GCS context since it may differ from S3 or Azure blob, an object can't be mutated more than 1 time per second. In this case it looks like the same file may be updated many times very quickly due to the load from multiple threads.
@dantengsky please let us know if you need anything from us to help resolve this. This has unfortunately turned into a blocker for our go-live ramp.
@dantengsky please let us know if you need anything from us to help resolve this. This has unfortunately turned into a blocker for our go-live ramp.
Thank you for letting us know about this issue.
Ultimately, it seems that all the data is inserted, but presumably, the error should not be generated.
As you mentioned, the failure to write last_snapshot_location_hint
does not prevent the transaction from being committed successfully.
The last_snapshot_location_hint
is written on a best-effort basis. Currently, only the attach table functionality relies on this hint, which may read stale data (or fail to read) if the hint file has not been successfully written.
Normal table scans are not affected, as they do not depend on this hint file.
A new setting will be added to allow disabling the writing of the last_snapshot_location_hint
if needed.
Search before asking
Version
v1.2.597-nightly
What's Wrong?
The following error is logged in Databend Query pod when inserting into a table on multiple threads. Ultimately, it seems that all the data is inserted, but presumably the error should not be generated.
How to Reproduce?
Script below is Python, but multi-thread insert into a table should be easily replicable otherwise:
Are you willing to submit PR?