Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.25k stars 225 forks source link

error can't create table `posthog`.`sharded_session_recording_events`: code: 253 #501

Closed davidlbyrne closed 2 years ago

davidlbyrne commented 2 years ago

I'm trying to test the restoring a backup from cluster A to a new cluster B.

  1. Exec into clickhouse pod and clickhouse-server container. k exec -it chi-clickhouse-qa-qa-restore-0-1-0 -c clickhouse -- /bin/bash
  2. If posthog DB exists drop empty posthog DB form clusterB (new cluster) This is what existed from a fresh install.
  3. Exec into clickhouse pod, clickhosue-backup container: k exec -it chi-clickhouse-qa-qa-restore-0-0-0 -c clickhouse-backup -- /bin/bash
  4. Run restore: clickhouse-backup restore chi-clickhouse-qa-qa-01-0-0-full-2022-08-20-18-00-01

The problem is the restore process runs for a minute and then exits with an error:

2022/08/21 16:53:00.399797 info CREATE TABLE posthog.sharded_session_recording_events (uuidUUID,timestampDateTime64(6, 'UTC'),team_idInt64,distinct_idString,session_idString,window_idString,snapshot_dataString,created_atDateTime64(6, 'UTC'),has_full_snapshotInt8 MATERIALIZED JSONExtractBool(snapshot_data, 'has_full_snapshot') COMMENT 'column_materializer::has_full_snapshot',_timestampDateTime,_offsetUInt64) ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/posthog.session_recording_events', '{replica}', _timestamp) PARTITION BY toYYYYMMDD(timestamp) ORDER BY (team_id, toHour(timestamp), session_id, timestamp, uuid) TTL toDate(created_at) + toIntervalWeek(3) SETTINGS index_granularity = 512 2022/08/21 16:53:00.414937 error can't create tableposthog.sharded_session_recording_events: code: 253, message: Replica /clickhouse/tables/0/posthog.session_recording_events/replicas/chi-clickhouse-qa-qa-restore-0-0 already exists after 32 times, please check your schema dependencies

Slach commented 2 years ago

on step 2. did you DROP DATABASE ... SYNC ?

try

clickhouse-backup restore --rm chi-clickhouse-qa-qa-01-0-0-full-2022-08-20-18-00-01

check your qa-restore kubernetes cluster use different zookeeper with production cluster

error mean you still have data in zookeeper as quick workaround you could use SYSTEM DROP REPLICA 'chi-clickhouse-qa-qa-restore-0-0';

before execute clickhouse-backup restore

davidlbyrne commented 2 years ago

Thank you Slach. It turned out that we were sharing a zookeeper instance between both clusters. Cluster A and Cluster B. After creating a dedicated zookeeper instance per cluster and adding the SYNC to the drop command everything worked.