Open karthik-thiyagarajan opened 3 weeks ago
For now, i've deleted the query from the task queue and now am able to bring up the cluster. But ideally am still not sure why a single drop column statement causes the entire server to go down.
do you have VIEW or MaterializedView which is querying this table?
Yes, we have one mview accessing this table.
Can you please be kind and share create table
& create materialized view
& alter add column
& alter drop column
?
============================================================================================
CREATE MATERIALIZED VIEW av.mv_stats_aggregated TO av.stats_aggregated
(
datapoint_timestamp
DateTime, tid
String,
sid
String, bkd
Bool, ail
LowCardinality(String), anl
LowCardinality(String),
ac
String, an
String, wc
String, wn
String, are
Enum8('NA' = 0, 'TW' = 1, 'L' = 2, 'M' = 3, 'S' = 4, 'H' = 5),
ahe
String, stb
UInt64, srb
UInt64, all
AggregateFunction(avg, UInt32),
awl
AggregateFunction(avg, UInt32), aq
AggregateFunction(avg, UInt8), ucm
AggregateFunction(uniq, String)
)
AS SELECT
toStartOfHour(timestamp) AS datapoint_timestamp, tid, sid, bkd, ail, anl, ac, acn, wc, wn, are,
multiIf((usl = 1) AND (utl = 1), 'usatl', (usl = 1) AND (utl = 0), 'usl', (usl = 0) AND (utl = 1), 'utl', 'undefined') AS ahe,
sum(tb) AS stb, sum(rb) AS srb, avgState(sart) AS all, avgState(cart) AS awl, avgState(qoe) AS aq,
uniqState(cm) AS ucm
FROM av.av_stats_all_v3
WHERE ((bkd = true) OR (rb > 0) OR (tb > 0)) AND ((usl = 1) OR (utl = 1)) AND (rb < 1125899906842624) AND (tb < 1125899906842624)
GROUP BY
toStartOfHour(timestamp), tid, sid, bkd, ail, anl, ac, acn, wc, wn, are, ahe
============================================================================================
CREATE TABLE av.av_stats_all_v3
(
timestamp
DateTime64(3), s
String, tid
String, sid
String, cm
String, si
String, sp
String, di
String, dp
String, ac
String, wc
String, wup
String, ur
String, sn
String,
bkd
Bool, pf
Bool, attgf
Bool, lp
UInt64, rp
UInt64, rb
UInt64, tb
UInt64, rtm
DateTime64(3), sdm
String, dtm
DateTime64(3), dm
String, bid
String, un
String,
vi
UInt64, at
UInt32, as
Bool, vhs
String, vhd
String, vts
Array(String), vtd
Array(String), cmd
String, acn
String, drr
String, rdu
String, wcn
String,
usl
UInt8 DEFAULT 1, utl
UInt8 DEFAULT 1, it
UInt32, vp
UInt32, st
String, dt
String, vis
String, vid
String, vns
String, vnd
String, car
UInt32, sar
UInt32, tllp
UInt32,
twlp
UInt32, qoe
UInt8, cn
String, si4
String, si6
String, dp
String, vn
String, dn
String, ced
String, tsv
String,
adht
Enum8('ADHTU' = 0, 'ADHTPB' = 1, 'ADHTPT' = 2) DEFAULT 0,
prt
UInt8, fhdn
String, fhds
String, are
Enum8('Not_Evaluated' = 0, 'TW' = 1, 'L' = 2, 'M' = 3, 'S' = 4, 'H' = 5) DEFAULT 0,
wcnl
LowCardinality(String), casl
LowCardinality(String), cadl
LowCardinality(String),
mte
Enum8('UNKNOWN' = 0, 'GPMT' = 1, 'APMT' = 2, 'AEMT' = 3) DEFAULT 0,
sccte
Enum8('CTU' = 0, 'CTW' = 1, 'CTW' = 2) DEFAULT 0,
ail
LowCardinality(String), acl
LowCardinality(String),
wcl
LowCardinality(String), onl
LowCardinality(String), mvl
LowCardinality(String), cl
LowCardinality(String),
col
LowCardinality(String), cdail
LowCardinality(String), anl
LowCardinality(String), acnl
LowCardinality(String),
dvte
Enum8('DTU' = 0, 'DTAP' = 1, 'DTS' = 2, 'DTG' = 3) DEFAULT 0,
dpte
Enum8('DTU' = 0, 'DTI' = 1, 'DTIVC' = 2, 'DTC2C' = 3) DEFAULT 0,
re
Enum8('Unknown' = 0, 'TW' = 1, 'L' = 2, 'M' = 3, 'S' = 4, 'H' = 5) DEFAULT 0,
ccl
LowCardinality(FixedString(2)),
cfl
LowCardinality(String), cart
UInt32, sart
UInt32, sn
String, fhdsi
String
)
ENGINE = Distributed('{cluster}', 'av', 'av_local', rand())
============================================================================================
I have obfuscated the column names. We also have a local table of the same spec as that of distributed table given above and its a replicated merge tree table. Let me know if this helps
============================================================================================
We tried to add a column that already exists and dropped the same column
And you run something like alter table av.av_stats_all_v3 add column s String
it failed, then you executed alter table av.av_stats_all_v3 drop column s
? Or have you altered the local table ?
I've a local table and a distributed table. I also have a mview on top of this, like what i gave above. I ran the drop column on local table first and it dropped the columns successfully. Then i dropped the same column on the distributed table and thats when it dint drop the column but rather i got this pod ran into crashloop back off with the above error.
@karthik-thiyagarajan you have duplicate in the enum 'CTW' = 1, 'CTW' = 2
.
Is it a typo during the obfuscation?
Yes, sorry for that. There could have been some typos while obfuscating the colum names. Let me know if you want me to fix it and give a clean ddl.
I tried to reproduce it with your schema, but I failed. Everything works for me.
Can you confirm if you followed the below steps and still see that its working ?
This is our scenario when we saw the failure.
We are using 3 shard 3 replica setup of clickhouse in kubernetes cluster with 3 zookeeper replica pods. We tried to add a column that already exists and dropped the same column and it caused all the pods to crash. We are using 24.6 clickhouse verison and not using any experimental features.
I've obfuscated the dbname and column for obvious reasons. I'm unable to bring up any clickhouse pods now and please help me to recover from this failure.
: While sending /var/lib/clickhouse/store/9cb/9cbafb32-5bc1-4a88-9146-9b705f93ffe9/shard2_all_replicas/1.bin. (ALL_CONNECTION_TRIES_FAILED), Stack trace (when copying this message, always include the lines below):
=========