Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.25k stars 225 forks source link

In a replicated and distributed setup is it sufficient to take backup from only one node? #463

Closed SohamChakraborty closed 2 years ago

SohamChakraborty commented 2 years ago

Hello,

I am thinking of using the tool to backup the data from clickhouse servers. Before I do the backup, I have some usage questions that I want to clarify.

# cat remote.xml 
<yandex>
    <remote_servers incl="clickhouse_remote_servers">
        <data>
            <shard>
                <weight>1</weight>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>host1.foo.bar.com</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>host2.foo.bar.com</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <weight>1</weight>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>host3.foo.bar.com</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>host4.foo.bar.com</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <weight>1</weight>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>host5.foo.bar.com</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>host6.foo.bar.com</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <weight>1</weight>
                <internal_replication>true</internal_replication>
                <replica>
                    <host>host7.foo.bar.com</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>host8.foo.bar.com</host>
                    <port>9000</port>
                </replica>
            </shard>
        </data>

This is the configuration of the data database. Now, my question is, do I have to run clickhouse-backup on all of the nodes mentioned above? Or is it sufficient to run the backup on only one node because all the others are essentially same data? I am a clickhouse newbie so please bear with me if this is a basic question.

Slach commented 2 years ago

for multi-shard/multi-replica installation

you should run create_remote <backup_name> only on the first node in each shard and run restore_remote --rm --schema <backup_name>' on each node in each shard after it runrestore_remote --data ' only on the first node in each shard

to minimize data conflicts from different replicas