Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.29k stars 226 forks source link

Not able to restore backups - can't parse metadata.cache.S3 file #949

Closed LukaszMarchewka closed 4 months ago

LukaszMarchewka commented 4 months ago

restore_remote operation isn't deterministic and sometimes trows an error can't parse /tmp/.clickhouse-backup-metada.cache.S3 to map[string]Backup

image image

The problem happens sometimes (as I have mentioned it isn't deterministic). I have checked /tmp/.clickhouse-backup-metada.cache.S3 after the failure and the file was correct. The problem is here: https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/general.go#L92 but I don't know the root cause. I have two ideas:

The problem is even bigger. The error "blocks" the entire tool, so you are not able to create new backups, list them and so on, because all commands uses the same matadata file.

Slach commented 4 months ago

The problem is here: https://github.com/Altinity/clickhouse-backup/blob/master/pkg/storage/general.go#L92

no you see can't parse not can't read

not strong consistency of the file system

I think this is the root cause

Which is your environment?

do you use docker container or standalone linux server?

Slach commented 4 months ago

Do you try to run multiple clickhouse-backup instances concurrenlty?

LukaszMarchewka commented 4 months ago

No, never. Always one by one. But the second one almost immediately after completion of the previous one.

LukaszMarchewka commented 4 months ago

But, they may be backup_remote command in the background. It is a side car container doing backups.

LukaszMarchewka commented 4 months ago

@Slach I think you are right, each execution of clickhouse-backups creates a new metadata file. I run backups jobs in the background (side car container), so it can override the file.

LukaszMarchewka commented 4 months ago

@Slach thank you :)

Slach commented 4 months ago

But, they may be backup_remote command in the background. It is a side car container doing backups.

Do you means you try to execute multiple sidecar container in the same time in the same pod? This is bad idea

better re-deploy your bitnami helm chart and add to your pod secondary container with clickhouse-backup server execution in this case you allow to run backup with

INSERT INTO system.backup_actions(command) VALUES('create_remote ...')

look https://github.com/Altinity/clickhouse-backup/blob/master/Examples.md#how-to-use-clickhouse-backup-in-kubernetes for examples (you shall adopt it for your bitnami helm chart installation, i hope you will understand main idea)

LukaszMarchewka commented 4 months ago

There is only one side car container with backups but I execute restore operation via kubectl exec

Slach commented 4 months ago

yep in this case /tmp/*.s3 file will overriden mutiple times, ok. this issue solved let's figure out with non restored incremental backup ...