Considering the query targeting at a partition table whose hash partition key is called partition_col:
select * from partition_table where partition_col in ("a", "b", "c", ...).
And all the sub query plans share the same predicate, and if the inlist is large, the min-max and bloom-filter index may exhibit a very bad performance. However, actually, most of the values in the inlist don't exist at one specific partition, that is to say, the predicate in the sub query plan can be simplified into a more simple one.
Proposal
Introduce an optimization procedure to remove the unnecessary values in the in-list predicate of the distributed sub query plan.
More implementation details are necessary before coding.
Describe This Problem
Considering the query targeting at a partition table whose hash partition key is called
partition_col
:select * from partition_table where partition_col in ("a", "b", "c", ...)
.And all the sub query plans share the same predicate, and if the inlist is large, the
min-max
andbloom-filter
index may exhibit a very bad performance. However, actually, most of the values in the inlist don't exist at one specific partition, that is to say, the predicate in the sub query plan can be simplified into a more simple one.Proposal
Introduce an optimization procedure to remove the unnecessary values in the in-list predicate of the distributed sub query plan.
More implementation details are necessary before coding.
Additional Context
No response