Open heyciao opened 1 week ago
Explain plans when using "Optimized trivial count" - the result of ~4.7M rows is wrong (default behaviour, "_optimize_use_implicitprojections" and "_optimize_trivial_countquery" not set in the configuration)
:) select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
; select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain AST select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain SYNTAX select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain QUERY TREE select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain PLAN select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain PIPELINE select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 147a94fa-e4d9-415b-9bf9-ac5f74f16b22
┌─count()─┐
1. │ 4702208 │ -- 4.70 million
└─────────┘
1 row in set. Elapsed: 0.005 sec. Processed 24.58 thousand rows, 196.62 KB (5.11 million rows/s., 40.91 MB/s.)
Peak memory usage: 74.24 KiB.
EXPLAIN
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 883b5f8f-cc62-4e68-9488-f63715ec98bc
┌─explain──────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ AggregatingProjection │
3. │ Expression (Before GROUP BY) │
4. │ Expression │
5. │ ReadFromMergeTree (dbcharidx.doc_chunks) │
6. │ ReadFromPreparedSource (Optimized trivial count) │
└──────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.002 sec.
EXPLAIN AST
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: e669c7c1-4188-436e-a01c-96b24ffbf130
┌─explain─────────────────────────────────────┐
1. │ SelectWithUnionQuery (children 1) │
2. │ ExpressionList (children 1) │
3. │ SelectQuery (children 3) │
4. │ ExpressionList (children 1) │
5. │ Function count (children 1) │
6. │ ExpressionList (children 1) │
7. │ Asterisk │
8. │ TablesInSelectQuery (children 1) │
9. │ TablesInSelectQueryElement (children 1) │
10. │ TableExpression (children 1) │
11. │ TableIdentifier doc_chunks │
12. │ Function and (children 1) │
13. │ ExpressionList (children 2) │
14. │ Function equals (children 1) │
15. │ ExpressionList (children 2) │
16. │ Function arrayElement (children 1) │
17. │ ExpressionList (children 2) │
18. │ Identifier chunk_lower_and_orig │
19. │ Literal UInt64_1 │
20. │ Literal 'aaaaaaaaaaaaaaaaaaaaa' │
21. │ Function equals (children 1) │
22. │ ExpressionList (children 2) │
23. │ Identifier doc_id │
24. │ Literal UInt64_999999999999 │
└─────────────────────────────────────────────┘
24 rows in set. Elapsed: 0.001 sec.
EXPLAIN SYNTAX
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: bfea2a62-f3c1-4065-9856-67ef3479e7b5
┌─explain─────────────────────────────────────────────────────────────────────────────────┐
1. │ SELECT count() │
2. │ FROM doc_chunks │
3. │ WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.001 sec.
EXPLAIN QUERY TREE
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: ee1a1912-8677-4bc9-9bc1-b8fb4022b1c0
┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ QUERY id: 0 │
2. │ PROJECTION COLUMNS │
3. │ count() UInt64 │
4. │ PROJECTION │
5. │ LIST id: 1, nodes: 1 │
6. │ FUNCTION id: 2, function_name: count, function_type: aggregate, result_type: UInt64 │
7. │ JOIN TREE │
8. │ TABLE id: 3, alias: __table1, table_name: dbcharidx.doc_chunks │
9. │ WHERE │
10. │ FUNCTION id: 4, function_name: and, function_type: ordinary, result_type: UInt8 │
11. │ ARGUMENTS │
12. │ LIST id: 5, nodes: 2 │
13. │ FUNCTION id: 6, function_name: equals, function_type: ordinary, result_type: UInt8 │
14. │ ARGUMENTS │
15. │ LIST id: 7, nodes: 2 │
16. │ FUNCTION id: 8, function_name: arrayElement, function_type: ordinary, result_type: String │
17. │ ARGUMENTS │
18. │ LIST id: 9, nodes: 2 │
19. │ COLUMN id: 10, column_name: chunk_lower_and_orig, result_type: Array(String), source_id: 3 │
20. │ CONSTANT id: 11, constant_value: UInt64_1, constant_value_type: UInt8 │
21. │ CONSTANT id: 12, constant_value: 'aaaaaaaaaaaaaaaaaaaaa', constant_value_type: String │
22. │ FUNCTION id: 13, function_name: equals, function_type: ordinary, result_type: UInt8 │
23. │ ARGUMENTS │
24. │ LIST id: 14, nodes: 2 │
25. │ COLUMN id: 15, column_name: doc_id, result_type: UInt64, source_id: 3 │
26. │ CONSTANT id: 16, constant_value: UInt64_999999999999, constant_value_type: UInt64 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
26 rows in set. Elapsed: 0.001 sec.
EXPLAIN
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 4ef949ef-2051-4e6b-a985-d28b64af67b4
┌─explain──────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ AggregatingProjection │
3. │ Expression (Before GROUP BY) │
4. │ Expression │
5. │ ReadFromMergeTree (dbcharidx.doc_chunks) │
6. │ ReadFromPreparedSource (Optimized trivial count) │
└──────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.002 sec.
EXPLAIN PIPELINE
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 407e5d1b-1ede-42ac-b9b5-0ebc0b1eac30
┌─explain──────────────────────────────────────────────────────────────┐
1. │ (Expression) │
2. │ ExpressionTransform │
3. │ (AggregatingProjection) │
4. │ (Expression) │
5. │ ExpressionTransform × 2 │
6. │ (Expression) │
7. │ ExpressionTransform × 2 │
8. │ (ReadFromMergeTree) │
9. │ MergeTreeSelect(pool: ReadPool, algorithm: Thread) × 2 0 → 1 │
10. │ (ReadFromPreparedSource) │
11. │ SourceFromSingleChunk 0 → 1 │
└──────────────────────────────────────────────────────────────────────┘
11 rows in set. Elapsed: 0.003 sec.
Explain plans when NOT using "Optimized trivial count" - the result of 0 rows is correct: ( I set in the user's configuration the following parameters:
<optimize_use_implicit_projections>0</optimize_use_implicit_projections>
<optimize_trivial_count_query>0</optimize_trivial_count_query>
)
:) select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
; select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain AST select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain SYNTAX select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain QUERY TREE select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain PLAN select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
explain PIPELINE select count(*)
from doc_chunks
where chunk_lower_and_orig[1] ='aaaaaaaaaaaaaaaaaaaaa'
and doc_id = 999999999999
;
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: ed72dce5-e052-4209-8c95-365dc9754348
┌─count()─┐
1. │ 0 │
└─────────┘
1 row in set. Elapsed: 0.014 sec. Processed 4.73 million rows, 37.81 MB (326.00 million rows/s., 2.61 GB/s.)
Peak memory usage: 106.99 KiB.
EXPLAIN
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 00b04d6b-492e-487b-81af-ff510cd4e0cc
┌─explain──────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Aggregating │
3. │ Expression (Before GROUP BY) │
4. │ Expression │
5. │ ReadFromMergeTree (dbcharidx.doc_chunks) │
└──────────────────────────────────────────────────┘
5 rows in set. Elapsed: 0.002 sec.
EXPLAIN AST
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: aa78d82c-8727-455d-808d-db6ebf19bd27
┌─explain─────────────────────────────────────┐
1. │ SelectWithUnionQuery (children 1) │
2. │ ExpressionList (children 1) │
3. │ SelectQuery (children 3) │
4. │ ExpressionList (children 1) │
5. │ Function count (children 1) │
6. │ ExpressionList (children 1) │
7. │ Asterisk │
8. │ TablesInSelectQuery (children 1) │
9. │ TablesInSelectQueryElement (children 1) │
10. │ TableExpression (children 1) │
11. │ TableIdentifier doc_chunks │
12. │ Function and (children 1) │
13. │ ExpressionList (children 2) │
14. │ Function equals (children 1) │
15. │ ExpressionList (children 2) │
16. │ Function arrayElement (children 1) │
17. │ ExpressionList (children 2) │
18. │ Identifier chunk_lower_and_orig │
19. │ Literal UInt64_1 │
20. │ Literal 'aaaaaaaaaaaaaaaaaaaaa' │
21. │ Function equals (children 1) │
22. │ ExpressionList (children 2) │
23. │ Identifier doc_id │
24. │ Literal UInt64_999999999999 │
└─────────────────────────────────────────────┘
24 rows in set. Elapsed: 0.001 sec.
EXPLAIN SYNTAX
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 41aa27ed-8df7-47c6-9fee-ddb7c2b5b193
┌─explain─────────────────────────────────────────────────────────────────────────────────┐
1. │ SELECT count() │
2. │ FROM doc_chunks │
3. │ WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999) │
└─────────────────────────────────────────────────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.001 sec.
EXPLAIN QUERY TREE
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 16d5fae9-6c57-473c-9882-6a8790e1f303
┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ QUERY id: 0 │
2. │ PROJECTION COLUMNS │
3. │ count() UInt64 │
4. │ PROJECTION │
5. │ LIST id: 1, nodes: 1 │
6. │ FUNCTION id: 2, function_name: count, function_type: aggregate, result_type: UInt64 │
7. │ JOIN TREE │
8. │ TABLE id: 3, alias: __table1, table_name: dbcharidx.doc_chunks │
9. │ WHERE │
10. │ FUNCTION id: 4, function_name: and, function_type: ordinary, result_type: UInt8 │
11. │ ARGUMENTS │
12. │ LIST id: 5, nodes: 2 │
13. │ FUNCTION id: 6, function_name: equals, function_type: ordinary, result_type: UInt8 │
14. │ ARGUMENTS │
15. │ LIST id: 7, nodes: 2 │
16. │ FUNCTION id: 8, function_name: arrayElement, function_type: ordinary, result_type: String │
17. │ ARGUMENTS │
18. │ LIST id: 9, nodes: 2 │
19. │ COLUMN id: 10, column_name: chunk_lower_and_orig, result_type: Array(String), source_id: 3 │
20. │ CONSTANT id: 11, constant_value: UInt64_1, constant_value_type: UInt8 │
21. │ CONSTANT id: 12, constant_value: 'aaaaaaaaaaaaaaaaaaaaa', constant_value_type: String │
22. │ FUNCTION id: 13, function_name: equals, function_type: ordinary, result_type: UInt8 │
23. │ ARGUMENTS │
24. │ LIST id: 14, nodes: 2 │
25. │ COLUMN id: 15, column_name: doc_id, result_type: UInt64, source_id: 3 │
26. │ CONSTANT id: 16, constant_value: UInt64_999999999999, constant_value_type: UInt64 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
26 rows in set. Elapsed: 0.001 sec.
EXPLAIN
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 0e7fa738-63a9-4d6d-b99b-e72c9409b862
┌─explain──────────────────────────────────────────┐
1. │ Expression ((Project names + Projection)) │
2. │ Aggregating │
3. │ Expression (Before GROUP BY) │
4. │ Expression │
5. │ ReadFromMergeTree (dbcharidx.doc_chunks) │
└──────────────────────────────────────────────────┘
5 rows in set. Elapsed: 0.001 sec.
EXPLAIN PIPELINE
SELECT count(*)
FROM doc_chunks
WHERE ((chunk_lower_and_orig[1]) = 'aaaaaaaaaaaaaaaaaaaaa') AND (doc_id = 999999999999)
Query id: 4a13cafd-968b-4033-b1fa-87fd27bdc3ac
┌─explain──────────────────────────────────────────────────────────────────┐
1. │ (Expression) │
2. │ ExpressionTransform × 8 │
3. │ (Aggregating) │
4. │ Resize 8 → 8 │
5. │ AggregatingTransform × 8 │
6. │ StrictResize 8 → 8 │
7. │ (Expression) │
8. │ ExpressionTransform × 8 │
9. │ (Expression) │
10. │ ExpressionTransform × 8 │
11. │ (ReadFromMergeTree) │
12. │ MergeTreeSelect(pool: ReadPool, algorithm: Thread) × 8 0 → 1 │
└──────────────────────────────────────────────────────────────────────────┘
12 rows in set. Elapsed: 0.003 sec.
Table structure:
Query & its execution:
Problem: The above result (~4.7M rows found) is absolutely wrong as the table doesn't have any row with a "doc_id" of "999999999999":
Workaround: Disable for the user the parameters "_optimize_use_implicitprojections" and "_optimize_trivial_countquery" (disabling only "optimize_trivial_count_query" does not work => related to https://github.com/ClickHouse/ClickHouse/issues/70939 ?)
Now the query returns the correct result (0 rows found):
Other: 1) The table being queried has 2 partitions, but each partition has only 1 part (because I previously ran a "optimize table ... partition ... final"):
2) If I specify explicitly a partition name (e.g. "_where (_partitionvalue.1) = 't'") then the results seem correct but the explain plan shows that "optimized trivial count" is anyway not being used.
3) I'm sorry but even after 2 hours trying I haven't been able to simplify how to reproduce the problem: I did try to create a more simple test table then I inserted a few million rows and then I executed a similar SQL but the results were correct but the explain plan showed that it wasn't using the "optimized trivial count" => I'm not sure what are the conditions needed to use "optimized trivial count"... .
4) Without the "and"-condition, the result matches in both cases (with & without "Optimized trivial count") with 5069372 rows:
5) I will post below the explain plans of the two variants that use & don't use the "Optimized trivial count".
6) Ran the above SQLs with Clickhouse version 24.10.1.2812, I first noticed the problem with 24.9.2.42. Running Clickhouse in a QEMU/KVM VM, storage hosted in the host on ZFS FS 2xZRAID2, storage attached to VM with QEMU/KVM "shared buffers"/"virtiofs".
Thank you :smile: