apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.56k stars 3.54k forks source link

[Python] Add is_nan, is_null, is_valid as operators to DNF filters #38750

Open JacekPliszka opened 12 months ago

JacekPliszka commented 12 months ago

Describe the enhancement requested

Currently pyarrow.parquet.core.filters_to_expression handles equality/inequality operators and in and not in operators.

I propose adding is_nan, is_null, is_valid operators where value passed would be ignored but they would return field.is_nan() field.is_null() field.is_valid() expressions.

This is a very easy change but it would allow null/nan filtering in DNF form. These functions are already implemented for pyarrow.dataset.Expression https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Expression.html

Component(s)

Python

davlee1972 commented 7 months ago

I added a bunch here with a DNF function.. “is” and “is not” operators support is None, is True, is False, is not None, is not True, etc..

https://github.com/apache/arrow/issues/39128