delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
2k stars 363 forks source link

Delete operation doesn't do type coercion #1921

Open echai58 opened 7 months ago

echai58 commented 7 months ago

Environment

Delta-rs version: 0.13.0

Binding: python bindings

Environment:


Bug

What happened: It seems that when a string predicate is passed into DeltaTable.delete, when it gets parsed as a Datafusion Expression, it is not taking into account the schema of the table. For example, if there is a column with type pa.int32(), and you try to use a predicate like price = 100, it raises an error ValueError: Invalid comparison operation: Int32 <= Int64, which I assume is coming from 100 being parsed as a int64. This is supported by if I pass in price = CAST(100 as INT) instead, it works as expected.

What you expected to happen: The parser should be schema-aware when converting the string predicate to a Datafusion Expression.

How to reproduce it: This is a minimal reproduction:

import tempfile
import pandas as pd
import pyarrow as pa
from deltalake import DeltaTable
from deltalake.writer import write_deltalake

pandas_data = pd.DataFrame.from_dict(
        {
            "price": [100, 150],
            "qty": [15, 20],
        }
    )

schema = pa.schema(
    [
        ("price", pa.int32()),
        ("qty", pa.int32()),
    ]
)

with tempfile.TemporaryDirectory() as path:
    table = pa.Table.from_pandas(pandas_data, schema)
    write_deltalake(
        table_or_uri=path,
        data=table,
        mode="error",
    )

    delta_table = DeltaTable(path) 

    # this does not work
    delta_table.delete(predicate="price = 100")

    # this works
    delta_table.delete(predicate="price = CAST(100 as INT)")
ion-elgreco commented 7 months ago

It looks like we are building directly a physical plan here so that's why there is no type coercion.

We probably need to refactor this.

echai58 commented 3 months ago

@ion-elgreco are there any updates on this?

ion-elgreco commented 3 months ago

@echai58 this PR got merged: https://github.com/delta-io/delta-rs/commit/f56d8c98ce5656ff4aee93aa3861efcdb322eda3, so it should be doable to refactor the delete operation now.

peter-xu-zhou commented 2 months ago

@ion-elgreco when will the predicate problem for delete/update operations be done ? I also met the problem.

ion-elgreco commented 2 months ago

@ion-elgreco when will the predicate problem for delete/update operations be done ? I also met the problem.

I dont have any plans to work on this. But feel free to pick it up