Open zhangyue-hashdata opened 23 hours ago
Looks interesting. And I have some questions to discuss.
hashjoin
node and seqscan
node run in the same process can use the runtime filter. Which means the tables should have same distributed policy on the join columns or one of the table is replicated.
+----------+ AttrFilter +------+ ScanKey +------------+ | HashJoin | ------------> | Hash | ---------> | SeqScan/AM | +----------+ +------+ +------------+
If "gp_enable_runtime_filter_pushdown" is on, three steps will be run:
Step 1. In ExecInitHashJoin(), try to find the mapper between the var in hashclauses and the var in SeqScan. If found we will save the mapper in AttrFilter and push them to Hash node;
Step 2. We will create the range/bloom filters in AttrFilter during building hash table, and these filters will be converted to the list of ScanKey and pushed down to Seqscan when the building finishes;
Step 3. If AM support SCAN_SUPPORT_RUNTIME_FILTER, these ScanKeys will be pushed down to the AM module further, otherwise will be used to filter slot in Seqscan;
perf: CPU E5-2680 v2 10 cores, memory 32GB, 3 segments
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
⚠️ To skip CI: Add
[skip ci]
to your PR title. Only use when necessary! ⚠️