Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.29k stars 227 forks source link

Backing up a cluster #884

Closed BoweFlex closed 8 months ago

BoweFlex commented 8 months ago

I've been trying to think through backing up a cluster as a whole, and am curious if this is something the tool already accounts for that I'm just missing. We have a cluster of three clickhouse servers, and I was originally going to create a cron job (or run clickhouse-backup server --watch) on one server to take backups. However, I'm concerned that if we lost that server we would lose backups even while quorum is maintained by the other two. I'm not sure I want to take three separate backups, one per server, either. Is this something clickhouse-backup handles, or do I need to find a way to run the backups on an individual server depending on which one's available?

Slach commented 8 months ago

. We have a cluster of three clickhouse servers, and I was originally going to create a cron job (or run clickhouse-backup server --watch) on one server to take backups.

do you have 3 servers with 1 shard and 3 replicas or 3 separate shards with 1 replicas in each shard?

2 replicas per shard should be enough for most of cases

look to https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-back-up--restore-a-sharded-cluster

and https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-use-clickhouse-backup-in-kubernetes

However, I'm concerned that if we lost that server we would lose backups even while quorum is maintained by the other two.

backups shall be stored on remote_storage, don't store backups locally, it will allocate unnecessary disk space

when you lost first server backup will not lost

if you want to use --watch you need to detect a current hostname in command section of clickhouse-backup container and detect this is the first replica in the shard ($-0-0 regexp for $HOSTNAME)

for restore you still need to implements invoke restore command sequences as described above

BoweFlex commented 8 months ago

I believe we have 3 shards, each with their own replica. Configuration is:

<featbit_ch_cluster>
            <shard>
                <replica>
                    <host>l-clickhouse101</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>l-clickhouse102</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>l-clickhouse103</host>
                    <port>9000</port>
                </replica>
            </shard>
        </featbit_ch_cluster>

backups shall be stored on remote_storage, don't store backups locally, it will allocate unnecessary disk space

when you lost first server backup will not lost

Sorry, I may have been unclear with my second question. I'm not concerned about losing backups themselves when the first cluster goes down. Rather, if I have a job that is only running on host1 to take backups and then host1 goes down then I will no longer be taking backups.

Slach commented 8 months ago

look to

kubelet get svc --all-namespaces | grep chi

and use chi-{your_chi_name} as endpoint in this case

you can handle errors in cronjob and DNS endpoint will use round robin to connect