Open Werepyrex10 opened 3 months ago
I don't thinking using multiple delta writers in the same table is a good idea, the whole ecosystem is not mature enough, just use one writer for everything.
@Werepyrex10 I suggest you disable the checkpointing in delta-spark or delta-rs for now.
@Werepyrex10 What storage backend is this? If it's S3, is the Databricks cluster using the same S3DynamoDbLogStore configuration as the delta-rs process?
Hey @rtyler , we are using azure blob storage as the storage backend
We had the same issue, or you config a Dynamo for the delta log or you use only 1 writer. We ended up with the 1 writer solution.
I have the following setup:
0.17.3
optimize
job on top of the delta table once a dayWhen both processes decide to create a checkpoint on the same version, there is no failure on writing, since the notebook does a multi-part checkpoint, while the delta-rs process does a single-part checkpoint. Here is a preview of the result of both operations, as seen in the delta logs:
After this occurs, when trying to open the table with the delta-rs lib, we get the following error:
This is because of the way the library counts the number of parts https://github.com/delta-io/delta-rs/blob/rust-v0.17.3/crates/core/src/kernel/snapshot/log_segment.rs#L447-L452
Should the library ignore the multi-part files if the
_last_checkpoint
file does not have anyparts
specified ?