apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

feat: Support more types with BloomFilterAgg #1039

Closed mbutrovich closed 3 weeks ago

mbutrovich commented 3 weeks ago

Which issue does this PR close?

Closes #1023.

Rationale for this change

What changes are included in this PR?

The other integer types just need a cast. For strings I had to add put_binary to spark_bloom_filter.

How are these changes tested?

Added more types to test for Spark 3.5+.

mbutrovich commented 3 weeks ago

Marking as draft because it's complete for #1023, but I want to check if I need to expand BloomFilterMightContain's support for these other types.

I also want to investigate why this doesn't require new golden plans.

mbutrovich commented 3 weeks ago

If I'm reading this Spark code correctly, BloomFilterAgg supports integers and string types as input, but the probe BloomFilterMightContain only supports Long.

mbutrovich commented 3 weeks ago

I don't see Spark introducing any BloomFilterAggs when I run TPC-DS SF100. Maybe it's just not enough data since the initial design doc talks about TPC-DS with 3TB data set. I could play with the knobs a bit to try to trigger the rewrite in the optimizer, but I'll just call this PR ready.

codecov-commenter commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 34.31%. Comparing base (845b654) to head (caf13bd). Report is 2 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1039 +/- ## ============================================ - Coverage 34.46% 34.31% -0.15% + Complexity 888 887 -1 ============================================ Files 113 113 Lines 43580 43574 -6 Branches 9658 9643 -15 ============================================ - Hits 15021 14954 -67 - Misses 25507 25548 +41 - Partials 3052 3072 +20 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

viirya commented 3 weeks ago

Thanks @mbutrovich @andygrove @kazuyukitanimura