How to work with incremental backups on S3

Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.

https://altinity.com

Other

1.25k stars 225 forks source link

How to work with incremental backups on S3 #451

Closed anikin-aa closed 2 years ago

anikin-aa commented 2 years ago

Hi there !

I am trying to set up Disaster Recovery (2 ClickHouse clusters in separate DCs) for ClickHouse via backup restore.

My approach is:

Do the full backup from DC1 and save it to S3
Do scheduled incremental backups from DC1 to S3 with diff-from-remote

Is it possible to download full backup and increments on DC2 from S3 and restore them as local backups ?

Slach commented 2 years ago

unfortunatelly, currently clickhouse-backup support S3 disks wrong and backup onlylocal metadata files, intead of backup full s3 snapshot https://github.com/AlexAkulov/clickhouse-backup/issues/447

I would propose use rclone.org to mirror s3 bucket under your s3 disk after execute clickhouse-backup upload --diff-from-remote or try to add versioning to S3 bucket used for s3 disks with retention on AWS side

anikin-aa commented 2 years ago

@Slach , thanks for the quick answer. but i am not using S3 disks just saving backups of MergeTree tables or may be i am wrong with s3 disks understanding

Slach commented 2 years ago

oops sorry for misslead

Is it possible to download full backup and increments on DC2 from S3 and restore them as local backups ? Yes it possible,
DC1# clickhouse-backup create_remote backup1 && clickhouse-backup delete local backup1
DC1# clickhouse-backup create_remote backup2 --diff-from-remote=backup1 && clickhouse-backup delete local backup2

DC2# clickhouse-backup download backup2 DC2# clickhouse-backup restore --rm backup2



during downloading `backup2` will download all parts from backup2 and parts from backup1 which marked as "required" during execute `clickhouse-backup create_remote backup2 --diff-from-remote=backup1` and you can successfully restore backup2 as full backup

anikin-aa commented 2 years ago

@Slach , yes, i read about this behavior in docs, but i would like to save some time on backups downloading and download them when newly increment are created, download them from S3

Slach commented 2 years ago

download command will download backup data from remote storage, if your storage_type: s3 then download will from s3

anikin-aa commented 2 years ago

@Slach , yes, i got this too, thanks.

But my idea is the next:

Full backup is taken from DC1 and saved to S3
Download backup[1] from S3 locally on DC2
Incremental backup is take from DC1 and saved to S3
Download backup[3] from S3 locally on DC2

...

And so on. In case of disaster on DC1, restore all of the local backups on DC2.

Will this work ?

Slach commented 2 years ago

Is your S3 stored in DC1 or something else?

Yes, it will work but allocate more disk space on DC2, cause backup[1] will full download

if you execute only download backup[3] then backup[1] will partially download

anikin-aa commented 2 years ago

@Slach , nope, my S3 on DC2. My assumption is that i will save some time on backup downloading in case of Disaster.

if you execute only download backup[3] then backup[1] will partially download Ok, nice, thanks !

And the last one, should i restore only last local backup backup[3] ? Or full chain starting from the full backup[1] ?

Slach commented 2 years ago

And the last one, should i restore only last local backup backup[3] ? Or full chain starting from the full backup[1] ?

clickhouse-backup restore --rm backup[3] will enough for this case

my S3 on DC2. My assumption is that i will save some time on backup downloading in case of Disaster.

yes, it will less time use local pre-downloaded backup instead of download remote backup but you will allocate x2 space, first in S3 and second in DC2 clickhouse server

anikin-aa commented 2 years ago

@Slach , thank you !