dask-contrib / dask-deltatable

A Delta Lake reader for Dask
BSD 3-Clause "New" or "Revised" License
46 stars 15 forks source link

Can we get rid of `filters_to_expression`? #59

Open j-bennet opened 1 year ago

j-bennet commented 1 year ago

Currently in dask-deltatable, we're using pyarrow.dataset.dataset, which we filter with a pyarrow.Expression:

https://github.com/dask-contrib/dask-deltatable/blob/dbeb8cc3f94ac6bc612e5dab6f8d3440f37455e6/dask_deltatable/core.py#L78

Would the ParquetDataset be more appropriate here? It can accept filters as Expression, or tuple/DNF form, which would allow us to skip that filters_to_expression step.

https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html