delta-io / delta-rs

A native Rust library for Delta Lake, with bindings into Python
https://delta-io.github.io/delta-rs/
Apache License 2.0
1.97k stars 365 forks source link

Allow `pyarrow.dataset.Expression` in `filters` kwarg of `deltalake.DeltaTable.to_pyarrow_table` #2597

Closed giacomorebecchi closed 1 week ago

giacomorebecchi commented 2 weeks ago

Description

I would like to have more flexibility in using the filters kwarg in the method to_pyarrow_table, in particular by providing directly a pyarrow.dataset.Expression.

Use Case In my use case, this is due to the fact that I want to be able to filter out null values, particularly by using a version of this issue:

Related Issue(s) Also, note that this PR might be an occasion to reduce code duplication by removing this function: https://github.com/delta-io/delta-rs/blob/f0416921a3814a33ea1b3796a2a1468f8c76ca3d/python/deltalake/table.py#L333-L350 in favour of this: https://github.com/apache/arrow/blob/69e8a78c018da88b60f9eb2b3b45703f81f3c93d/python/pyarrow/parquet/core.py#L135-L199