delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.03k stars 365 forks source link

Generic DeltaTable error: Version mismatch with new schema merge functionality in AWS S3 #2262

Closed liamphmurphy closed 3 months ago

liamphmurphy commented 4 months ago

Environment

Delta-rs version: python v0.16

Binding: ^^

Environment:


Bug

What happened:

To test the rust engine, we cleared out any existing delta tables in our nonprod environment and switched from pyarrow over to the rust engine with schema merging, with this write_deltalake call:

 write_deltalake(s3_path, table, schema=pyarrow_schema, mode="append", engine="rust", partition_by=["Uid","date","hour"], schema_mode="merge", configuration={"delta.logRetentionDuration": "interval 7 day"})

Despite it being a brand new Delta table and after some successful writes, eventually the lambdas started erroring with Generic DeltaTable error: Version mismatch. I believe the error is coming from here: https://github.com/delta-io/delta-rs/blob/3e6a4d61923602d189f559636b3e3e3f61b6a924/crates/core/src/table/state.rs#L192

What you expected to happen:

Especially since we are testing with a fresh table, I'd expect all writes to work (and not just some) even with the new schema merge flag set.

How to reproduce it: I was not able to reproduce with a randomly generated dataset locally, so my guess is its something more to do with the dynamo locking on S3 If you have thoughts on how I could test this better, please let me know.

Note that we have roughly 10 concurrent lambdas that could potentially write to Lambda. However, before this change we had 50 writing with pyarrow and all was well.

rtyler commented 4 months ago

Does this only manifest with the schema evolution? Or are you able to see errors with append or merge writes as well?

ion-elgreco commented 4 months ago

Does this only manifest with the schema evolution? Or are you able to see errors with append or merge writes as well?

It happens at any operation when there is concurrency and the state gets updated at the end