ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.69k stars 6.91k forks source link

Clickhouse Server interupted during Column Change causes Segmentation faults #6877

Closed volfco closed 4 years ago

volfco commented 5 years ago

I've seem to stumbled into a weird bug. I have a ReplicatedMergeTree with a Column Tags.Key that's an Array(String). I executed a alter table histograms modify column Tags.Key Array(LowCardinality(String)); to change the column to be LowCardinality. The cluster has two replicated shards- one shard converted successfully. For some reason server 1 and 2 were restarted during this task, and are now crash looping on start.

This happened in a QA environment, so blowing away the data sucks but that brings the cluster back online.

Here is the server log set to trace:

2019.09.09 16:52:46.738129 [ 1 ] {} <Information> : Starting ClickHouse 19.13.3.26 with revision 54425
2019.09.09 16:52:46.738255 [ 1 ] {} <Information> Application: starting up
2019.09.09 16:52:46.741336 [ 1 ] {} <Information> StatusFile: Status file /var/lib/clickhouse/status already exists - unclean restart. Contents:
PID: 16952
Started at: 2019-09-09 16:50:01
Revision: 54425

2019.09.09 16:52:46.742052 [ 1 ] {} <Debug> Application: rlimit on number of file descriptors is 262144
2019.09.09 16:52:46.742077 [ 1 ] {} <Debug> Application: Initializing DateLUT.
2019.09.09 16:52:46.742085 [ 1 ] {} <Trace> Application: Initialized DateLUT with time zone 'America/Chicago'.
2019.09.09 16:52:46.742454 [ 1 ] {} <Debug> Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'clickhouse-qa1.internal' as replica host.
2019.09.09 16:52:46.743128 [ 1 ] {} <Debug> ConfigReloader: Loading config '/etc/clickhouse-server/users.xml'
2019.09.09 16:52:46.744541 [ 1 ] {} <Information> Application: Loading metadata from /var/lib/clickhouse/
2019.09.09 16:52:46.748605 [ 1 ] {} <Information> DatabaseOrdinary (default): Total 4 tables.
2019.09.09 16:52:46.751709 [ 4 ] {} <Information> BackgroundProcessingPool: Create BackgroundProcessingPool with 16 threads
2019.09.09 16:52:46.756353 [ 5 ] {} <Debug> default.test3: Loading data parts
2019.09.09 16:52:46.756352 [ 4 ] {} <Debug> default.test: Loading data parts
2019.09.09 16:52:46.756370 [ 6 ] {} <Debug> default.test2: Loading data parts
2019.09.09 16:52:46.756408 [ 7 ] {} <Debug> default.test4: Loading data parts
2019.09.09 16:52:46.764709 [ 7 ] {} <Debug> default.test4: Loaded data parts (5 items)
2019.09.09 16:52:46.765050 [ 4 ] {} <Debug> default.test: Loaded data parts (6 items)
2019.09.09 16:52:46.766172 [ 5 ] {} <Debug> default.test3: Loaded data parts (11 items)
2019.09.09 16:52:46.769367 [ 6 ] {} <Debug> default.test2: Loaded data parts (8 items)
2019.09.09 16:52:46.769492 [ 1 ] {} <Information> DatabaseOrdinary (default): Starting up tables.
2019.09.09 16:52:46.769708 [ 6 ] {} <Warning> default.test3: Removing temporary directory /var/lib/clickhouse/data/default/test3/tmp_merge_1568005200_41_223_3
2019.09.09 16:52:46.773116 [ 1 ] {} <Information> DatabaseOrdinary (metrics): Total 2 tables.
2019.09.09 16:52:46.775623 [ 7 ] {} <Information> BackgroundSchedulePool: Create BackgroundSchedulePool with 16 threads
2019.09.09 16:52:46.782206 [ 15 ] {} <Debug> default.test3 (MergerMutator): Selected 9 parts from 1568005200_41_74_2 to 1568005200_218_223_1
2019.09.09 16:52:46.782351 [ 15 ] {} <Debug> default.test3 (MergerMutator): Merging 9 parts: from 1568005200_41_74_2 to 1568005200_218_223_1 into tmp_merge_1568005200_41_223_3
2019.09.09 16:52:46.783003 [ 15 ] {} <Debug> default.test3 (MergerMutator): Selected MergeAlgorithm: Horizontal
2019.09.09 16:52:46.783079 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 3205 marks from part 1568005200_41_74_2, total 26247168 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.786809 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 3061 marks from part 1568005200_75_108_2, total 25067520 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.790431 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 2475 marks from part 1568005200_109_137_2, total 20267008 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.794623 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 2838 marks from part 1568005200_138_171_2, total 23240704 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.798500 [ 7 ] {} <Trace> ZooKeeper: initialized, hosts: clickhouse-qa1.internal:2181,clickhouse-qa3.internal:2181,tst-2.internal:2181
2019.09.09 16:52:46.805424 [ 7 ] {} <Debug> metrics.histograms: Loading data parts
2019.09.09 16:52:46.806637 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 2184 marks from part 1568005200_172_199_2, total 17883136 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.808861 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 471 marks from part 1568005200_200_205_1, total 3850240 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.809664 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 484 marks from part 1568005200_206_211_1, total 3956736 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.811026 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 489 marks from part 1568005200_212_217_1, total 3997696 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:46.817403 [ 15 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 503 marks from part 1568005200_218_223_1, total 4112384 rows starting from the beginning of the part, columns: Timestamp, Client, Path, Tags.Key, Tags.Value, Range, Steps, IndexNotation, Histogram
2019.09.09 16:52:47.103587 [ 7 ] {} <Debug> metrics.histograms: Loaded data parts (120 items)
2019.09.09 16:52:47.120394 [ 1 ] {} <Information> DatabaseOrdinary (metrics): Starting up tables.
2019.09.09 16:52:47.121002 [ 24 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeRestartingThread): Activating replica.
2019.09.09 16:52:47.123670 [ 24 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Loading queue from /clickhouse/tables/metrics/histograms/1/replicas/metrics_qa_replica.clickhouse-qa1.internal/queue
2019.09.09 16:52:47.124529 [ 24 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Having 1 queue entries to load, 0 entries already loaded.
2019.09.09 16:52:47.125035 [ 24 ] {} <Trace> metrics.histograms (ReplicatedMergeTreeQueue): Loaded queue
2019.09.09 16:52:47.125722 [ 24 ] {} <Information> metrics.histograms (ReplicatedMergeTreeQueue): Loading 3 mutation entries: 0000000000 - 0000000002
2019.09.09 16:52:47.126393 [ 25 ] {} <Trace> metrics.histograms (ReplicatedMergeTreeQueue): Marking mutation 0000000000 done because it is <= mutation_pointer (0000000002)
2019.09.09 16:52:47.126445 [ 25 ] {} <Trace> metrics.histograms (ReplicatedMergeTreeQueue): Marking mutation 0000000001 done because it is <= mutation_pointer (0000000002)
2019.09.09 16:52:47.126454 [ 25 ] {} <Trace> metrics.histograms (ReplicatedMergeTreeQueue): Marking mutation 0000000002 done because it is <= mutation_pointer (0000000002)
2019.09.09 16:52:47.128645 [ 1 ] {} <Debug> Application: Loaded metadata.
2019.09.09 16:52:47.128791 [ 1 ] {} <Information> Application: It looks like the process has no CAP_NET_ADMIN capability, 'taskstats' performance statistics will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_net_admin=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems. It also doesn't work if you run clickhouse-server inside network namespace as it happens in some containers.
2019.09.09 16:52:47.128820 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_nice' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.
2019.09.09 16:52:47.128799 [ 7 ] {} <Debug> DDLWorker: Started DDLWorker thread
2019.09.09 16:52:47.129017 [ 4 ] {} <Debug> DDLWorker: Started DDLWorker cleanup thread
2019.09.09 16:52:47.129463 [ 1 ] {} <Information> Application: Listening http://[::]:8123
2019.09.09 16:52:47.129557 [ 1 ] {} <Information> Application: Listening for connections with native protocol (tcp): [::]:9000
2019.09.09 16:52:47.129628 [ 1 ] {} <Information> Application: Listening for replica communication (interserver) http://[::]:9009
2019.09.09 16:52:47.129670 [ 31 ] {} <Debug> metrics.histograms: Will rename Tags%2EKey%20converting.mrk2 to Tags%2EKey.mrk2, rename Tags%2EKey%20converting.bin to Tags%2EKey.bin, rename Tags%2EKey%20converting.dict.bin to Tags%2EKey.dict.bin, rename Tags%2EKey%20converting.dict.mrk2 to Tags%2EKey.dict.mrk2 in part 1567054800_0_135803_53_135806
2019.09.09 16:52:47.129711 [ 31 ] {} <Trace> MergeTreeSequentialBlockInputStream: Reading 0 marks from part 1567054800_0_135803_53_135806, total 0 rows starting from the beginning of the part, columns: Tags.Key
2019.09.09 16:52:47.130034 [ 1 ] {} <Information> Application: Available RAM: 31.25 GiB; physical cores: 10; logical cores: 10.
2019.09.09 16:52:47.130058 [ 1 ] {} <Information> Application: Ready for connections.
2019.09.09 16:52:47.130628 [ 46 ] {} <Error> BaseDaemon: ########################################
2019.09.09 16:52:47.130716 [ 46 ] {} <Error> BaseDaemon: (version 19.13.3.26) (from thread 31) Received signal Segmentation fault (11).
2019.09.09 16:52:47.130756 [ 46 ] {} <Error> BaseDaemon: Address: NULL pointer. Access: read. Address not mapped to object.
2019.09.09 16:52:47.136918 [ 7 ] {} <Debug> DDLWorker: Processing tasks
2019.09.09 16:52:47.136937 [ 4 ] {} <Debug> DDLWorker: Cleaning queue
2019.09.09 16:52:47.137425 [ 7 ] {} <Debug> DDLWorker: Waiting a watch
2019.09.09 16:52:47.162211 [ 46 ] {} <Error> BaseDaemon: 0. clickhouse-server(StackTrace::StackTrace(ucontext const&)+0x30) [0x6f29000]
1. clickhouse-server() [0x327ebf5]
2. /lib64/libpthread.so.0(+0xf5d0) [0x7f2875e3b5d0]
3. clickhouse-server() [0x5ca5b5b]
4. clickhouse-server(DB::DataTypeLowCardinality::serializeBinaryBulkStateSuffix(DB::IDataType::SerializeBinaryBulkSettings&, std::shared_ptr<DB::IDataType::SerializeBinaryBulkState>&) const+0x1f) [0x5ca856f]
5. clickhouse-server(DB::DataTypeArray::serializeBinaryBulkStateSuffix(DB::IDataType::SerializeBinaryBulkSettings&, std::shared_ptr<DB::IDataType::SerializeBinaryBulkState>&) const+0x55) [0x5c7f375]
6. clickhouse-server(DB::MergedColumnOnlyOutputStream::writeSuffixAndGetChecksums()+0x1c7) [0x60fdf17]
7. clickhouse-server(DB::MergeTreeData::alterDataPart(DB::NamesAndTypesList const&, std::vector<std::shared_ptr<DB::ASTIndexDeclaration>, std::allocator<std::shared_ptr<DB::ASTIndexDeclaration> > > const&, bool, std::unique_ptr<DB::MergeTreeData::AlterDataPartTransaction, std::default_delete<DB::MergeTreeData::AlterDataPartTransaction> >&)+0x7ed) [0x6072cdd]
8. clickhouse-server(DB::ReplicatedMergeTreeAlterThread::run()+0x6fd) [0x610323d]
9. clickhouse-server(DB::BackgroundSchedulePoolTaskInfo::execute()+0xd8) [0x633aaf8]
10. clickhouse-server(DB::BackgroundSchedulePool::threadFunction()+0xaa) [0x633b54a]
11. clickhouse-server() [0x633b6f4]
12. clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1b8) [0x3191c88]
13. clickhouse-server() [0x76cc0ff]
14. /lib64/libpthread.so.0(+0x7dd5) [0x7f2875e33dd5]
15. /lib64/libc.so.6(clone+0x6d) [0x7f287585aead]

2019.09.09 16:52:47.230368 [ 21 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Not executing log entry for part 1568005200_177053_177055_1 because merges and mutations are cancelled now.
... snipped 44 lines ...
2019.09.09 16:52:49.118423 [ 17 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Not executing log entry for part 1568005200_177053_177055_1 because merges and mutations are cancelled now.
2019.09.09 16:52:49.131760 [ 42 ] {} <Debug> ConfigReloader: Loading config '/etc/clickhouse-server/config.xml'
2019.09.09 16:52:49.143156 [ 21 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Not executing log entry for part 1568005200_177053_177055_1 because merges and mutations are cancelled now.
... snipped 195 lines ...
2019.09.09 16:52:57.055624 [ 21 ] {} <Debug> metrics.histograms (ReplicatedMergeTreeQueue): Not executing log entry for part 1568005200_177053_177055_1 because merges and mutations are cancelled now.
KochetovNicolai commented 5 years ago

Hi.

The bug is weird indeed. I haven't reproduced it yet (tried different alters with corner cases). Will try to crash server after alter, maybe it will help.

It would be very helpful if you make some reproducible example.

gongled commented 5 years ago

Unfortunately, we met the same error as @volfco described before in the production environment during on execution of ALTER query. Our ClickHouse cluster (version 19.13.2.19) contains two nodes and one replicated shard.

Please, have a look at the following log below (sensitive data is obscured):

2019.10.29 16:30:47.046077 [ 19 ] {} <Debug> example_db..inner.mt_log_view: Loaded data parts (124 items)
2019.10.29 16:30:47.109759 [ 34 ] {} <Debug> StorageKafka (kafka_deliver): Started streaming to 1 attached views
2019.10.29 16:30:47.110376 [ 32 ] {} <Debug> StorageKafka (kafka_mt): Started streaming to 2 attached views
2019.10.29 16:30:47.129903 [ 1 ] {} <Information> DatabaseOrdinary (example_db): Starting up tables.
2019.10.29 16:30:47.130742 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeRestartingThread): Activating
replica.
2019.10.29 16:30:47.137119 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeQueue): Loading queue from /cl
ickhouse/tables/cluster/1/example_db.mt_log_view/replicas/ch1.example.tld/queue
2019.10.29 16:30:47.137470 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeQueue): Having 2 queue entries
 to load, 0 entries already loaded.
2019.10.29 16:30:47.141770 [ 1 ] {} <Information> DatabaseOrdinary (example_db): Total 6 tables.
2019.10.29 16:30:47.143999 [ 31 ] {} <Debug> example_db..inner.mt_log_view: Will rename ref_num%20converting.mrk2 to ref_num.
mrk2, rename ref_num%20converting.bin to ref_num.bin in part 20190924_0_540_63
2019.10.29 16:30:47.144368 [ 58 ] {} <Error> BaseDaemon: ########################################
2019.10.29 16:30:47.144426 [ 58 ] {} <Error> BaseDaemon: (version 19.13.2.19) (from thread 31) Received signal Segmentation fault (11).
2019.10.29 16:30:47.144452 [ 58 ] {} <Error> BaseDaemon: Address: 0x140 Access: read. Address not mapped to object.

SQL query:

ALTER TABLE
  example_db.`.inner.mt_log_view` ON CLUSTER cluster
MODIFY
  COLUMN ref_num UInt16
filimonov commented 4 years ago

Looks similar to #5289

filimonov commented 4 years ago

@gongled not sure if you have the same. Can you please attach full stack trace, i.e. lines which follows "Address not mapped to object"

filimonov commented 4 years ago

unsuccessful attempt to reproduce the bug:

DROP TABLE IF EXISTS test_low_cardinality_alter;

CREATE TABLE test_low_cardinality_alter
(
    `id`             String,
    sample_type      String,
    inserted_on      DateTime DEFAULT now(),
    inserted_on_date Date     DEFAULT toDate(inserted_on),
    sign             Int8     DEFAULT toInt8(1)
) ENGINE = CollapsingMergeTree(sign) PARTITION BY substring(id, 1, 1) ORDER BY id SETTINGS index_granularity = 8192;

INSERT INTO test_low_cardinality_alter(id, sample_type, sign)
SELECT toString(rand() % 8191), reinterpretAsFixedString( toUInt8( 32 + rand(1) % 26)) || reinterpretAsFixedString( toUInt8( 32 + rand(2) % 44)), 1
from numbers(20000000);
KochetovNicolai commented 4 years ago

@filimonov to reproduce it you probably need to kill server in the middle of alter. It still may be not trivial to reproduce (I've converted all non-PK columns from out test server several month ago and saw no problems). @volfco @gongled is it possible for you to send me your data which causes seagfault?