How to backup and restore correctly the cluster with distributed tables and shards (#2)

Opened the new issue because of previous #286 was closed but still question exists. We have two CH clusters:

ch1 (4 nodes: 2 shards with 2 nodes in each)
```
SELECT *
FROM system.clusters
```

┌─cluster─┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name───┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐ │ cluster1 │ 1 │ 1 │ 1 │ x.y.z.5 │ x.y.z.5 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ cluster1 │ 1 │ 1 │ 2 │ x.y.z.6 │ x.y.z.6 │ 9000 │ 1 │ default │ │ 0 │ 0 │ │ cluster1 │ 2 │ 1 │ 1 │ x.y.z.7 │ x.y.z.7 │ 9000 │ 0 │ default │ │ 0 │ 0 │ │ cluster1 │ 2 │ 1 │ 2 │ x.y.z.8 │ x.y.z.8 │ 9000 │ 0 │ default │ │ 0 │ 0 │ └─────────┴───────────┴──────────────┴─────────────┴─────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

- ch2 (6 nodes: 3 shards with 2 nodes in each)
```sql
SELECT *
FROM system.clusters

┌─cluster─┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
│ cluster1    │         1 │            1 │           1 │ x.y.y.21 │ x.y.y.21   │ 9000 │        0 │ default │                  │            0 │                       0 │
│ cluster1    │         1 │            1 │           2 │ x.y.x.21 │ x.y.x.21   │ 9000 │        1 │ default │                  │            0 │                       0 │
│ cluster1    │         2 │            1 │           1 │ x.y.y.22 │ x.y.y.22   │ 9000 │        0 │ default │                  │            1 │                       0 │
│ cluster1    │         2 │            1 │           2 │ x.y.x.22 │ x.y.x.22   │ 9000 │        0 │ default │                  │            1 │                       0 │
│ cluster1    │         3 │            1 │           1 │ x.y.y.23 │ x.y.y.23   │ 9000 │        0 │ default │                  │            1 │                       0 │
│ cluster1    │         3 │            1 │           2 │ x.y.x.23 │ x.y.x.23   │ 9000 │        0 │ default │                  │            1 │                       0 │
└─────────┴───────────┴──────────────┴─────────────┴────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘

The goal is to backup and restore from ch1 to ch2. On ch1 we did two remote full backups to s3: on one node from shard1 and on one node from shard2. On ch2:

we restored only schema on all 6 nodes
we restored the shard1 backup on one node from shard1

The data is restored on one node from shard1 and replicated to another node in this shard1 - that's expected. BUT for some reason the data is restored (or replicated) to nodes in shard2. So we get 2 shards containing same data - that's totally unexpected.

Issues:

why did we get two same shards?
how to restore backups with shards?

Distributed table doesn't contain data itself. When you backup data you backup underlying table.

Look to underlying table, which name you used when create table

Check distributed table on ch1

SELECT hostName() h, database, table, engine_full FROM cluster('sdns',system.tables) WHERE engine='Distributed';

after it could you share?

SELECT hostName() h, database, table, engine_full FROM cluster('sdns',system.tables) WHERE table IN ( ... underlying tables which your get in engine_full ...);

also, could you share from ch2

SELECT hostName() h, * FROM cluster('snds',system.macros)

Distributed tables from ch1:

-- changed "table" to "name" because there is no column "table"
SELECT hostName() h, database, name, engine_full FROM cluster('cluster1',system.tables) WHERE engine='Distributed'; 

-- shard1
SELECT
    hostName() AS h,
    database,
    name,
    engine_full
FROM cluster('cluster1', system.tables)
WHERE engine = 'Distributed'

┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-6.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-6.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-6.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-6.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-6.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-6.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-6.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-7.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-7.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-7.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-7.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-7.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-7.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-7.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘

14 rows in set. Elapsed: 0.013 sec.

-- shard2
SELECT
    hostName() AS h,
    database,
    name,
    engine_full
FROM cluster('cluster1', system.tables)
WHERE engine = 'Distributed'

┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-8.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-8.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-8.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-8.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-8.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-8.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-8.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-5.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-5.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-5.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-5.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-5.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-5.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-5.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘

14 rows in set. Elapsed: 0.016 sec.

Second query on ch1 but I still don't understand what is the difference between your first two queries:

-- shard1
SELECT
    hostName() AS h,
    database,
    name,
    engine_full
FROM cluster('cluster1', system.tables)
WHERE name IN ('table1', 'table2', 'table3', 'table4', 'table5', 'table6', 'table7')

┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-6.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-6.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-6.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-6.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-6.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-6.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-6.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-8.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-8.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-8.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-8.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-8.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-8.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-8.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘

14 rows in set. Elapsed: 0.024 sec.

-- shard2
SELECT
    hostName() AS h,
    database,
    name,
    engine_full
FROM cluster('cluster1', system.tables)
WHERE name IN ('table1', 'table2', 'table3', 'table4', 'table5', 'table6', 'table7')

┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-8.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-8.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-8.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-8.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-8.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-8.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-8.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘
┌─h───────────────────────────┬─database─┬─name────────────────────┬─engine_full─────────────────────────────────────────────────────────────┐
│ ip-x.y.z-5.ec2.internal │ default  │ table1               │ Distributed('cluster1', 'default', 'table1_local', rand())               │
│ ip-x.y.z-5.ec2.internal │ default  │ table2              │ Distributed('cluster1', 'default', 'table2_local', rand())              │
│ ip-x.y.z-5.ec2.internal │ default  │ table3      │ Distributed('cluster1', 'default', 'table3_local', rand())      │
│ ip-x.y.z-5.ec2.internal │ default  │ table4         │ Distributed('cluster1', 'default', 'table4_local', rand())         │
│ ip-x.y.z-5.ec2.internal │ default  │ table5    │ Distributed('cluster1', 'default', 'table5_local', rand())    │
│ ip-x.y.z-5.ec2.internal │ default  │ table6 │ Distributed('cluster1', 'default', 'table6_local', rand()) │
│ ip-x.y.z-5.ec2.internal │ default  │ table7         │ Distributed('cluster1', 'default', 'table7_local', rand())         │
└─────────────────────────────┴──────────┴─────────────────────────┴─────────────────────────────────────────────────────────────────────────┘

14 rows in set. Elapsed: 0.034 sec.

Query from ch2:

-- shard1
SELECT
    hostName() AS h,
    *
FROM cluster('cluster1', system.macros)

┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-z-21.ec2.internal │ replica │ x.y.z.21   │
│ ip-x-y-z-21.ec2.internal │ shard   │ 1            │
└────────────────────────────┴─────────┴──────────────┘
┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-z-22.ec2.internal │ replica │ x.y.z.22   │
│ ip-x-y-z-22.ec2.internal │ shard   │ 1            │
└────────────────────────────┴─────────┴──────────────┘
┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-z-23.ec2.internal │ replica │ x.y.z.23   │
│ ip-x-y-z-23.ec2.internal │ shard   │ 2            │
└────────────────────────────┴─────────┴──────────────┘

-- shard2
SELECT
    hostName() AS h,
    *
FROM cluster('cluster1', system.macros)

┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-z-22.ec2.internal │ replica │ x.y.z.22   │
│ ip-x-y-z-22.ec2.internal │ shard   │ 1            │
└────────────────────────────┴─────────┴──────────────┘
┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-y-21.ec2.internal │ replica │ x.y.y.21   │
│ ip-x-y-y-21.ec2.internal │ shard   │ 1            │
└────────────────────────────┴─────────┴──────────────┘
┌─h──────────────────────────┬─macro───┬─substitution─┐
│ ip-x-y-y-23.ec2.internal │ replica │ x.y.y.23   │
│ ip-x-y-y-23.ec2.internal │ shard   │ 2            │
└────────────────────────────┴─────────┴──────────────┘

Why you separately run queries on shard1 and shard2???

Properly installed clickhouse cluster should contains shard1 and shard2 and shard3 together on remote_servers config section, according to your system.clusters

And I don't see shard macro equal 3 in your query result for system.macros from ch2

Second query on ch1 but I still don't understand what is the difference between your first two queries: Second query should show you ReplicatedMergeTree ...

Could you share the following query from ch2?

SELECT hostName() h, database, name, engine_full FROM cluster('sdns',system.tables) WHERE name LIKE '%local%' ORDER BY database, name, h

I run queries separately on shard1 and shard2 just to give more output here. May be this output can help. Yes, I know there is no practical sense in it.

Redeployed clean CH cluster (ch2) and restored the data in this manner:

# shard1 node1:
clickhouse-backup --config /opt/clickhouse/clickhouse-backup/config.yml restore_remote \
full_ch_backup_shard1_2021-11-01 # restore from shard1 ch1 backup

# shard1 node2:
clickhouse-backup --config /opt/clickhouse/clickhouse-backup/config.yml restore_remote \
-s full_ch_backup_shard1_2021-11-01 # restore from shard1 ch1 backup

# shard2 node1:
clickhouse-backup --config /opt/clickhouse/clickhouse-backup/config.yml restore_remote \
full_ch_backup_shard2_2021-11-01 # restore from shard2 ch1 backup

# shard2 node2:
# did nothing just for test

Now the issue is all data is duplicated between two shards. Tables in different shards are equal :(

Running your query on ch2:

SELECT
    hostName() AS h,
    database,
    name,
    engine_full
FROM cluster('cluster1', system.tables)
WHERE name LIKE '%local%'
ORDER BY
    database ASC,
    name ASC,
    h ASC

��h����������������������������database���name����������������������������engine_full��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������YYYMMDD(req_time) ORDER BY (req_time, client_id, user_id) TTL del_time SETTINGS index_granularity = 8192                                               �
� ip-x-y-z-21.ec2.internal � default  � table1_local              � ReplicatedMergeTree('/clickhouse/tables/{shard}/table1_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, service_id) TTL del_time SETTINGS index_granularity = 8192, merge_with_ttl_timeout = 3600    �
� ip-x-y-z-22.ec2.internal � default  � table1_local              � ReplicatedMergeTree('/clickhouse/tables/{shard}/table1_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, service_id) TTL del_time SETTINGS index_granularity = 8192                                   �
� ip-x-y-z-21.ec2.internal � default  � table2_local      � ReplicatedMergeTree('/clickhouse/tables/{shard}/table2_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), req_id) SETTINGS index_granularity = 8192                                                       �
� ip-x-y-z-22.ec2.internal � default  � table2_local      � ReplicatedMergeTree('/clickhouse/tables/{shard}/table2_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), req_id) SETTINGS index_granularity = 8192                                                       �
� ip-x-y-z-21.ec2.internal � default  � table3_local    � ReplicatedSummingMergeTree('/clickhouse/tables/{shard}/table3_local', '{replica}') PARTITION BY toYYYYMM(hour) ORDER BY (hour, client_id, user_id, rejection_category, zvelo_cat_parent) TTL del_time SETTINGS index_granularity = 8192 �
� ip-x-y-z-22.ec2.internal � default  � table3_local    � ReplicatedSummingMergeTree('/clickhouse/tables/{shard}/table3_local', '{replica}') PARTITION BY toYYYYMM(hour) ORDER BY (hour, client_id, user_id, rejection_category, zvelo_cat_parent) TTL del_time SETTINGS index_granularity = 8192 �
� ip-x-y-z-21.ec2.internal � default  � table4_local �                                                                                                                                                                                                                                                       �
� ip-x-y-z-22.ec2.internal � default  � table4_local �                                                                                                                                                                                                                                                       �
� ip-x-y-z-21.ec2.internal � default  � table5_local         � ReplicatedMergeTree('/clickhouse/tables/{shard}/table5_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, user_id) TTL del_time SETTINGS index_granularity = 8192                                 �
� ip-x-y-z-22.ec2.internal � default  � table5_local         � ReplicatedMergeTree('/clickhouse/tables/{shard}/table5_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, user_id) TTL del_time SETTINGS index_granularity = 8192                                 �
� ip-x-y-z-21.ec2.internal � default  � table6_local         � ReplicatedMergeTree('/clickhouse/tables/{shard}/table6_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, user_id) SETTINGS index_granularity = 8192                                              �
� ip-x-y-z-22.ec2.internal � default  � table6_local         � ReplicatedMergeTree('/clickhouse/tables/{shard}/table6_local', '{replica}') PARTITION BY toYYYYMM(req_time) ORDER BY (toYYYYMM(req_time), client_id, user_id) SETTINGS index_granularity = 8192                                              �
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

14 rows in set. Elapsed: 0.016 sec.

-- macros
SELECT hostName() h, * FROM cluster('cluster1',system.macros)
ip-x-y-z-22.ec2.internal    replica x.y.z.22
ip-x-y-z-22.ec2.internal    shard   1
ip-x-y-z-21.ec2.internal    replica x.y.z.21
ip-x-y-z-21.ec2.internal    shard   1
ip-x-y-z-23.ec2.internal    replica x.y.z.23
ip-x-y-z-23.ec2.internal    shard   2

For some reason from macros output we have: ip-x-y-z-22.ec2.internal shard 1 - its the first node of shard2 ip-x-y-z-21.ec2.internal shard 1 - ok ip-x-y-z-23.ec2.internal shard 2 - its the first node of shard3

remote_servers config:

    <remote_servers replace="true">
        <cluster1>
            <!-- shard 01 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>x.y.z.21</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>x.y.y.21</host>
                    <port>9000</port>
                </replica>
            </shard>

            <!-- shard 02 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>x.y.z.22</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>x.y.y.22</host>
                    <port>9000</port>
                </replica>
            </shard>

            <!-- shard 03 -->
            <shard>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>x.y.z.23</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>x.y.y.23</host>
                    <port>9000</port>
                </replica>
            </shard>
        </cluster1>
    </remote_servers>

shard1 node2: clickhouse-backup --config /opt/clickhouse/clickhouse-backup/config.yml restore_remote \ -s full_ch_backup_shard1_2021-11-01 # restore from shard1 ch1 backup

look like here you will double data for shard1 cause you restore data twice and data will replicated between two replicas

Could you check macros again and use clusterAllReplicas instead of cluster?

SELECT hostName() h, * FROM clusterAllReplicas('snds',system.macros)

shard1 node2: clickhouse-backup --config /opt/clickhouse/clickhouse-backup/config.yml restore_remote -s full_ch_backup_shard1_2021-11-01 # restore from shard1 ch1 backup

Here I restored only schema on shard1 node2, data was restored in previous step on shard1 node1. Or there is no need to restore the only schema on shard1 node2 if all was restored on shard1 node1? Its my main question about alghoritm how to backup and restore shared data.

SELECT hostName() h, * FROM clusterAllReplicas('cluster1',system.macros)

ip-x-y-z-21.ec2.internal    replica x.y.z.21
ip-x-y-z-21.ec2.internal    shard   1
ip-x-y-z-23.ec2.internal    replica x.y.z.23
ip-x-y-z-23.ec2.internal    shard   2
ip-x-y-z-22.ec2.internal    replica x.y.z.22
ip-x-y-z-22.ec2.internal    shard   1
ip-x-y-y-23.ec2.internal    replica x.y.y.23
ip-x-y-y-23.ec2.internal    shard   2
ip-x-y-y-22.ec2.internal    replica x.y.y.22
ip-x-y-y-22.ec2.internal    shard   1
ip-x-y-y-21.ec2.internal    replica x.y.y.21
ip-x-y-y-21.ec2.internal    shard   1

Here I restored only schema on shard1 node2 I don't see --schema parameter here to restoring only schema

system.macros on ch-2 also look broken you have 3 shards in your remote_serves

but only two shards in system.macros check <macros><shard> section in configs in following nodes: 10.40.2.23, 10.40.3.23 when you start clickhouse on following nodes you will just register new replicas and all data which will inserted into any node in the same /clickhouse/tables/{shard}/table_name zookeeper path will replicated to another nodes

You need to understand, data replication and distribution queries are orthogonally and independent terms in ClickHouse

Engine=Distributed know nothing about ReplicatedMegreTree, and most of time used only for distribution queries which will during SELECT get data from undelying tables , pre-aggregate data if possible and final aggregation and sorting on node where distributed query was initialize, and during INSERT into engine=Distributed table, data will splitted into temporary *.bin files accodring to SHARDING KEY expression and after it will pushed one replica from selected shard accoring to remote_storage
Engine=ReplicatedMergeTree know nothing about Distributed and remote_servers, It use Zookeeper during startup just register new "repllica" and store new data parts names for replication, and fetch data part directly from another replica in the same ZK path

Altinity / clickhouse-backup

How to backup and restore correctly the cluster with distributed tables and shards (#2) #299