Open VladRassokhin opened 6 years ago
Here's script to reproduce:
-- Assume uniq(S) = 3000 in origin data
CREATE DATABASE IF NOT EXISTS test;
CREATE TABLE IF NOT EXISTS test.data (
S UInt32,
T UInt32,
C UInt32,
M UInt32
)
ENGINE = MergeTree() PARTITION BY toUInt32(S / 128) ORDER BY (S, C, M);
INSERT INTO test.data (S, T, C, M) SELECT toUInt32(number / 500000),0,0,0 FROM system.numbers LIMIT 1500000000;
select uniq(S) from test.data;
-- 1 rows in set. Elapsed: 2.505 sec. Processed 1.50 billion rows, 6.00 GB (598.84 million rows/s., 2.40 GB/s.)
Results are the same for versions: 18.4.0 (docker tag 18.1.0), 18.10.3 (docker tag 18.10.3), 1.1.54394,
In other words you're proposing to assume that the whole data block have the same value of column (which is a part of PK) if consequent values of that column in index are equal.
I.e. if 2 values of PK in sparse index are A and next is also A, just assume that whole block have value A repeated 8192 times (or index_granularity times) without reading the block from the disk?
I.e. if 2 values of PK in sparse index are A and next is also A, just assume that whole block have value A repeated 8192 times (or index_granularity times) without reading the block from the disk?
Correct, sounds reasonable. AFAIK for now it doesn't work even for simple one-column PrimaryKey
This could be reasoned about partitioning key, but not primary key. In PK it would lead to missing values (block was skipped, but shouldn't been).
@Imaskar index_granularity is used for Primary Key, so if we're not skipping parts (based on Partitioning Key), but only blocks with same indexes there would be no missed values.
Unfortunately both Partitioning and Primary starts with P, so there's a confusion with what PK means :)
when table is ordered by (k1,k2,k3,...) group by k1 can use something like read_in_order and immediately send the group to the client after k1 value changed. (much more memory efficient, faster results).
Also there is one more optimization mentioned here. NOT reading primary key column file ranges if value can be deducted from PK. Maybe we should create one more task for that. :)
This optimization is implemented by @dimarub2000 and prepared for production quality by @CurtizJ. There are a few more things to consider:
@CurtizJ is on vacation, he will clarify after 20th August.
Do we want to make further implovements here? Or we should close it?
Yes:
This task is not finished.
Given table definition
CREATE TABLE t ( A UInt32, B UInt32, C UInt32, D UInt32) ENGINE = MergeTree() PARTITION BY toUInt32(A / 128) ORDER BY (A, C, D) SETTINGS index_granularity = 8192
And ~160bn rows:Requests for field
A
are quite slow:It seems that query could be optimized by looking whether first part of tuple changes in adjacent masks.