ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.65k stars 6.91k forks source link

CPU hotspot on `ColumnString::index` #60993

Open cangyin opened 8 months ago

cangyin commented 8 months ago

This is a flamegraph for INSERT INTO SummingMergeTree SELECT ... query. It shows ClickHouse is spending significant CPU time on DB::ColumnString::index

e3f00448-4ea6-456a-b488-2db4232fe303 CPU

Both DB::ColumnString::index() in the graph are called by convertToFullColumnIfLowCardinality() in IMergeTreeIndexAggregator::update()

cangyin commented 1 week ago

Because convertToFullColumnIfLowCardinality is called (converting the whole column with heavy memory copy operations) for each data granule when update-ing BloomFilter/Set indexes.