delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2.36k stars 416 forks source link

Concurrent delete raises exception but performs delete #2509

Open echai58 opened 6 months ago

echai58 commented 6 months ago

Environment

Delta-rs version: 0.17.3

Binding: python


Bug

What happened: When performing a concurrent write + delete, the delete operation raises a DeltaError: Generic DeltaTable error: Version mismatch, but the delete gets performed.

What you expected to happen: The output should match the actual result of the operation. I'd be okay with either the concurrent delete failing with an exception, or succeeding without an exception.

How to reproduce it:

from deltalake import DeltaTable, write_deltalake
import pandas as pd

path = f"test-concurrent"

df = pd.DataFrame.from_dict(
    {
        "k": [1],
        "v": [1],
    }
)

write_deltalake(
    path,
    df,
    mode="overwrite"
)

# by getting both delta tables first, it simulates concurrent actions
table_1 = DeltaTable(path)
table_2 = DeltaTable(path)

data_1 = pd.DataFrame.from_dict(
    {
        "k": [3],
        "v": [-3],
    }
)

write_deltalake(
    path,
    data_1,
    mode="append"
)

table_2.delete("k = 1")

If you inspect the table data after the delete, you'll see the data was deleted, and the commit log includes a 002.json indicating the successful delete.

More details:

ion-elgreco commented 6 months ago

What happens if you do table_2.update_incremental() before running delete?

echai58 commented 6 months ago

@ion-elgreco Yeah running update_incremental before running delete allows delete the run correctly, which makes sense because it makes it no longer a concurrent operation to the append.