Closed mbutrovich closed 3 weeks ago
Marking as draft because it's complete for #1023, but I want to check if I need to expand BloomFilterMightContain
's support for these other types.
I also want to investigate why this doesn't require new golden plans.
If I'm reading this Spark code correctly, BloomFilterAgg
supports integers and string types as input, but the probe BloomFilterMightContain
only supports Long
.
I don't see Spark introducing any BloomFilterAggs when I run TPC-DS SF100. Maybe it's just not enough data since the initial design doc talks about TPC-DS with 3TB data set. I could play with the knobs a bit to try to trigger the rewrite in the optimizer, but I'll just call this PR ready.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 34.31%. Comparing base (
845b654
) to head (caf13bd
). Report is 2 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks @mbutrovich @andygrove @kazuyukitanimura
Which issue does this PR close?
Closes #1023.
Rationale for this change
What changes are included in this PR?
The other integer types just need a cast. For strings I had to add
put_binary
tospark_bloom_filter
.How are these changes tested?
Added more types to test for Spark 3.5+.