Open rouault opened 5 months ago
I guess there's a potential ambiguity of what such filtering means. Would that mean that a row is selected if all corresponding entries in the list match the predicate, or if just one would.
Yes, I think this is the crux of the issue, and we would first need some additional scalar kernel that works on list elements together with a reduction (like any/all in case of boolean predicates), such that the resulting kernel is still a scalar kernel for the field (i.e. preserves the shape, and can be used as a filter predicate)
Describe the enhancement requested
This enhancement request would be a continuation of the previous enhancement done in https://github.com/apache/arrow/pull/39065 to support nested fields where the nesting type is a struct.
Here I would like to apply a predicate pushdown on the
x
subfield of alist<element: struct<x: double not null, y: double not null>>
When trying to apply the following expression as parquet::Dataset::ScanBuilder::Filter(),
I get the following error:
nested paths only supported for structs
(I tried to remove that check, but I then get the following error:
Function 'struct_field' has no kernel matching input types (list<element: struct<x: double not null, y: double not null>>)
)Beyond the technical difficulties in implementing that, I guess there's a potential ambiguity of what such filtering means. Would that mean that a row is selected if all corresponding entries in the list match the predicate, or if just one would. For my use case (spatial filtering directly applied on GeoArrow struct/separated encoded geometry columns, for non-Point geometry types, in GeoParquet files), the later would be what I'm looking for.
CC @jorisvandenbossche @paleolimbot
Component(s)
C++, Parquet