apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

Add more types to BloomFilterAgg #1023

Closed mbutrovich closed 3 weeks ago

mbutrovich commented 1 month ago

What is the problem the feature request solves?

https://github.com/apache/datafusion-comet/pull/987 introduces native BloomFilterAgg with support for LongType, as in Spark 3.4. Spark 3.5+ added support for other integer types, and strings.

Describe the potential solution

For the other integer types, this should be easy to handle with a cast and use the existing put_long bloom filter method to match Spark behavior. For strings, the underlying bloom filter implementation needs a put_bytes method to match Spark's bloom filter behavior.

Additional context

No response