Open Blizzara opened 2 weeks ago
Note that the comparison for nested type is not supported in arrow-rs https://github.com/apache/arrow-rs/pull/5942, so we should implement them in datafusion. First attempt #11091
Ideally we should make it configurable so we can support both Spark and Postgres like behaviour for nulls
Is your feature request related to a problem or challenge?
We're working on running some used-to-be-Spark pipelines through DataFusion. One case we've noticed where DataFusion doesn't support something is comparing lists. (Spark allows)[https://github.com/apache/spark/blame/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L1025] comparing (==, !=, <, >, <=, >=, ..) columns of structs and lists, while in DataFusion those seem to throw:
For structs, from our internal testing:
For lists, this is shown in DataFusion's tests: https://github.com/apache/datafusion/blob/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion/sqllogictest/test_files/array_query.slt#L44
Maybe this would need to be improved on Arrow directly, seeing that the error is coming from https://github.com/apache/arrow-rs/blob/087f34b70e97ee85e1a54b3c45c5ed814f500b0a/arrow-ord/src/cmp.rs#L219?
Describe the solution you'd like
Binary predicates to be allowed for structs and lists, preferably following same semantics as in Spark (mostly I think it's a DFS over all the fields https://github.com/apache/spark/blob/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala#L285)
Describe alternatives you've considered
No response
Additional context
Related to https://github.com/apache/datafusion/issues/2326