Open filimonov opened 4 years ago
Loosely relates #658
BTW, @nikvas0 is interested in using constraint solvers for this task or similar.
Convertation to CNF is implemented in #18787.
EXPLAIN SYNTAX select * from (SELECT number AS A, number AS B, number AS C, number AS D, number AS E FROM numbers(10) ) where
A = 1 and (
B = 1 and (
D = 2 and (E = 3 or E=4 or E=5 /*.... repeated a lot of times */ ) OR
D = 3 and (E = 3 or E=4 or E=5 /*.... repeated a lot of times */ ))
) SETTINGS convert_query_to_cnf = 1;
┌─explain─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ SELECT │
│ A, │
│ B, │
│ C, │
│ D, │
│ E │
│ FROM │
│ ( │
│ SELECT │
│ number AS A, │
│ number AS B, │
│ number AS C, │
│ number AS D, │
│ number AS E │
│ FROM numbers(10) │
│ WHERE ((D = 2) OR (D = 3)) AND ((D = 2) OR (E IN (3, 4, 5))) AND ((D = 3) OR (E IN (3, 4, 5))) AND (B = 1) AND (E IN (3, 4, 5)) │
│ ) │
│ WHERE (E IN (3, 4, 5)) AND ((E IN (3, 4, 5)) OR (D = 2)) AND ((E IN (3, 4, 5)) OR (D = 3)) AND ((D = 2) OR (D = 3)) AND (A = 1) AND (B = 1) │
│ SETTINGS convert_query_to_cnf = 1 │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
WHERE
(E IN (3, 4, 5))
AND ((E IN (3, 4, 5)) OR (D = 2))
AND ((E IN (3, 4, 5)) OR (D = 3))
AND ((D = 2) OR (D = 3))
AND (A = 1) AND (B = 1)
The problem
The table a more than hundred cols but only 3 of them are indexed, let call them
PARTITION A ORDER BY B, C, D
The query use
A
B
andD
and another cols which is not indexedE
. And it's look likeWhile the numbers of condition is small the query is quite fast (150ms), with 50 conditions the query start to take 10s and more.
Strangely if the wheres are re ordered like so :
the query is sub seconds. From the boolean point of view the filter is the same but it seems clickhouse like that a way more than the first one.
Proposed solution
Introduce optimizer(s) to convert logical expression to conjunctive normal form (CNF) or disjunctive normal form (DNF)
CNF is smth like
DNF is smth like
Any logical expression has CNF / DNF form.
In some cases, CNF or DNF may be more preferable and both have more chances to use index than non-normalized conditions.
Extra options
During those optimizations, it's may be possible to detect always true / always false sub-conditions and remove them.
Some references in context of DBs