datafuselabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.29k stars 701 forks source link

feat: experimental runtime bloom pruning #15382

Open dantengsky opened 2 weeks ago

dantengsky commented 2 weeks ago

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Implements runtime pruning for probe-side data blocks by utilizing the runtime filter (based on the min-max filter) and the bloom filter index of the probe table.

Tests

Type of change


This change isโ€‚Reviewable

github-actions[bot] commented 2 weeks ago

Docker Image for PR

note: this image tag is only available for internal use, please check the internal doc for more details.

github-actions[bot] commented 2 weeks ago

ClickBench Report

github-actions[bot] commented 1 week ago

Docker Image for PR

note: this image tag is only available for internal use, please check the internal doc for more details.

github-actions[bot] commented 1 week ago

ClickBench Report

dantengsky commented 5 days ago

@xudong963 Thanks for helping me review this PR; really appreciate it. Let me try to make further adjustments to avoid using the bloom filter in situations where false positives could nearly make bloom pruning ineffective.