Closed volfco closed 4 years ago
Hi.
The bug is weird indeed. I haven't reproduced it yet (tried different alters with corner cases). Will try to crash server after alter, maybe it will help.
It would be very helpful if you make some reproducible example.
Unfortunately, we met the same error as @volfco described before in the production environment during on execution of ALTER
query. Our ClickHouse cluster (version 19.13.2.19) contains two nodes and one replicated shard.
Please, have a look at the following log below (sensitive data is obscured):
2019.10.29 16:30:47.046077 [ 19 ] {} <Debug> example_db..inner.mt_log_view: Loaded data parts (124 items)
2019.10.29 16:30:47.109759 [ 34 ] {} <Debug> StorageKafka (kafka_deliver): Started streaming to 1 attached views
2019.10.29 16:30:47.110376 [ 32 ] {} <Debug> StorageKafka (kafka_mt): Started streaming to 2 attached views
2019.10.29 16:30:47.129903 [ 1 ] {} <Information> DatabaseOrdinary (example_db): Starting up tables.
2019.10.29 16:30:47.130742 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeRestartingThread): Activating
replica.
2019.10.29 16:30:47.137119 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeQueue): Loading queue from /cl
ickhouse/tables/cluster/1/example_db.mt_log_view/replicas/ch1.example.tld/queue
2019.10.29 16:30:47.137470 [ 27 ] {} <Debug> example_db..inner.mt_log_view (ReplicatedMergeTreeQueue): Having 2 queue entries
to load, 0 entries already loaded.
2019.10.29 16:30:47.141770 [ 1 ] {} <Information> DatabaseOrdinary (example_db): Total 6 tables.
2019.10.29 16:30:47.143999 [ 31 ] {} <Debug> example_db..inner.mt_log_view: Will rename ref_num%20converting.mrk2 to ref_num.
mrk2, rename ref_num%20converting.bin to ref_num.bin in part 20190924_0_540_63
2019.10.29 16:30:47.144368 [ 58 ] {} <Error> BaseDaemon: ########################################
2019.10.29 16:30:47.144426 [ 58 ] {} <Error> BaseDaemon: (version 19.13.2.19) (from thread 31) Received signal Segmentation fault (11).
2019.10.29 16:30:47.144452 [ 58 ] {} <Error> BaseDaemon: Address: 0x140 Access: read. Address not mapped to object.
SQL query:
ALTER TABLE
example_db.`.inner.mt_log_view` ON CLUSTER cluster
MODIFY
COLUMN ref_num UInt16
Looks similar to #5289
@gongled not sure if you have the same. Can you please attach full stack trace, i.e. lines which follows "Address not mapped to object"
unsuccessful attempt to reproduce the bug:
DROP TABLE IF EXISTS test_low_cardinality_alter;
CREATE TABLE test_low_cardinality_alter
(
`id` String,
sample_type String,
inserted_on DateTime DEFAULT now(),
inserted_on_date Date DEFAULT toDate(inserted_on),
sign Int8 DEFAULT toInt8(1)
) ENGINE = CollapsingMergeTree(sign) PARTITION BY substring(id, 1, 1) ORDER BY id SETTINGS index_granularity = 8192;
INSERT INTO test_low_cardinality_alter(id, sample_type, sign)
SELECT toString(rand() % 8191), reinterpretAsFixedString( toUInt8( 32 + rand(1) % 26)) || reinterpretAsFixedString( toUInt8( 32 + rand(2) % 44)), 1
from numbers(20000000);
@filimonov to reproduce it you probably need to kill server in the middle of alter. It still may be not trivial to reproduce (I've converted all non-PK columns from out test server several month ago and saw no problems). @volfco @gongled is it possible for you to send me your data which causes seagfault?
I've seem to stumbled into a weird bug. I have a ReplicatedMergeTree with a Column Tags.Key that's an Array(String). I executed a
alter table histograms modify column Tags.Key Array(LowCardinality(String));
to change the column to be LowCardinality. The cluster has two replicated shards- one shard converted successfully. For some reason server 1 and 2 were restarted during this task, and are now crash looping on start.This happened in a QA environment, so blowing away the data sucks but that brings the cluster back online.
Here is the server log set to trace: