Query improvement about the query partition table

ShiKaiWi commented 10 months ago

Describe This Problem

Considering the query targeting at a partition table whose hash partition key is called partition_col: select * from partition_table where partition_col in ("a", "b", "c", ...).

And all the sub query plans share the same predicate, and if the inlist is large, the min-max and bloom-filter index may exhibit a very bad performance. However, actually, most of the values in the inlist don't exist at one specific partition, that is to say, the predicate in the sub query plan can be simplified into a more simple one.

Proposal

Introduce an optimization procedure to remove the unnecessary values in the in-list predicate of the distributed sub query plan.

More implementation details are necessary before coding.

Additional Context

No response

jiacai2050 commented 3 weeks ago

@zealchen Are you interested in this?

zealchen commented 2 weeks ago

@zealchen Are you interested in this?

Yes. Let me handle it.

apache / horaedb