apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

Push the runtime filter from HashJoin down to SeqScan or AM. #724

Open zhangyue-hashdata opened 23 hours ago

zhangyue-hashdata commented 23 hours ago

+----------+ AttrFilter +------+ ScanKey +------------+ | HashJoin | ------------> | Hash | ---------> | SeqScan/AM | +----------+ +------+ +------------+

If "gp_enable_runtime_filter_pushdown" is on, three steps will be run:

Step 1. In ExecInitHashJoin(), try to find the mapper between the var in hashclauses and the var in SeqScan. If found we will save the mapper in AttrFilter and push them to Hash node;

Step 2. We will create the range/bloom filters in AttrFilter during building hash table, and these filters will be converted to the list of ScanKey and pushed down to Seqscan when the building finishes;

Step 3. If AM support SCAN_SUPPORT_RUNTIME_FILTER, these ScanKeys will be pushed down to the AM module further, otherwise will be used to filter slot in Seqscan;

perf: CPU E5-2680 v2 10 cores, memory 32GB, 3 segments

  1. tpcds 10s off: 865s on: 716s 17%
  2. tpcds 100s off: 4592s on: 3751s 18%

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

⚠️ To skip CI: Add [skip ci] to your PR title. Only use when necessary! ⚠️


fanfuxiaoran commented 31 minutes ago

Looks interesting. And I have some questions to discuss.