delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.97k stars 365 forks source link

Deletion `_change_type` does not appear in change data feed #2579

Open Sirz3chs opened 3 weeks ago

Sirz3chs commented 3 weeks ago

Environment

Delta-rs version: 0.18.0

Binding: Python


Bug

What happened: I was testing the possibilities with CDF, and I think I ran into a bug. I don't have any delete operations appearing in the results _change_type whether by performing an overwrite or a direct delete on the delta table. image

What you expected to happen: I was expecting some delete rows to appear in the CDF. Reproducing the same operations with delta-spark gives this result: image

How to reproduce it: I've made a simple jupyter notebook with examples from the documentation. Here the the python to reproduce:

import pandas as pd
from deltalake import write_deltalake, DeltaTable

table_path = "tmp/delta-table"

df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]})
write_deltalake(
    table_path,
    df,
    configuration={
        "delta.minWriterVersion": "7",
        "delta.minReaderVersion": "3",
        "delta.enableChangeDataFeed": "true"
    },
    engine="rust"
)

df = pd.DataFrame({"num": [8, 9], "letter": ["dd", "ee"]})
write_deltalake(table_path, df, mode="append", engine="rust")

df = pd.DataFrame({"num": [11, 22], "letter": ["aa", "bb"]})
write_deltalake(table_path, df, mode="overwrite", engine="rust")

dt = DeltaTable(table_path)
dt.delete(predicate="num = 11")

print(dt.load_cdf(starting_version=0).read_pandas())

More details: I also tried update operations, and they appear fine in the CDF.

ion-elgreco commented 3 weeks ago

@Sirz3chs we currently only have limited support in writing CDF files for the update operation.

Overwrites, predicate overwrites, merge and delete don't write CDF files yet

Fyi @rtyler

Sirz3chs commented 3 weeks ago

Thanks for your quick answer, i spent some time digging into the doc and issues but didn't find the information.

ion-elgreco commented 3 weeks ago

In the release it's mentioned that it's added for the update operation, https://github.com/delta-io/delta-rs/releases/tag/python-v0.18.0