Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.23k stars 221 forks source link

download or restore_remote fails with downloadDiffParts error #582

Closed bmailhe closed 1 year ago

bmailhe commented 1 year ago

Steps to reproduce: On a cluster

On another cluster

Trying with an another diff backup, it fails with the parent backup:

# clickhouse-backup -c /etc/clickhouse-backup/config.yml restore_remote rZhV-PARTIAL-001-full_orderbooks_v1-snapshot_v1-221212090143 --rm
2022/12/13 10:18:44.809565  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2022/12/13 10:18:44.811766  info clickhouse connection open: tcp://localhost:9000 logger=clickhouse
2022/12/13 10:18:44.811793  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2022/12/13 10:18:44.815221  info SELECT * FROM system.disks; logger=clickhouse
2022/12/13 10:18:44.818403  info SELECT max(toInt64(bytes_on_disk * 1.02)) AS max_file_size FROM system.parts logger=clickhouse
2022/12/13 10:18:44.821748  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros' logger=clickhouse
2022/12/13 10:18:44.825103  info SELECT * FROM system.macros logger=clickhouse
2022/12/13 10:18:46.525132  info done                      backup=rZhV-PARTIAL-001-full_orderbooks_v1-snapshot_v1-221212090143 duration=581ms logger=backuper operation=download size=15.36KiB table_metadata=full_orderbooks_v1.snapshot_v1
2022/12/13 11:56:42.067830  info done                      backupName=rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904 duration=723ms logger=backuper operation=downloadTableMetadataIfNotExists size=11.33KiB table_metadata_diff=full_orderbooks_v1.snapshot_v1
2022/12/13 11:56:42.220543 error can't acquire semaphore during downloadDiffParts: context canceled logger=backuper operation=downloadDiffParts
2022/12/13 11:56:42.429295  info clickhouse connection closed logger=clickhouse
2022/12/13 11:56:42.429315 error one of Download go-routine return error: one of downloadDiffParts go-routine return error: full_orderbooks_v1.snapshot_v1 201905_0_99_19 not found on rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904 and all required backups sequence

Am I doing something wrong ?

bmailhe commented 1 year ago

Trying to download give me also an error:

# clickhouse-backup -c /etc/clickhouse-backup/config.yml download rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904
2022/12/13 13:21:00.426602  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2022/12/13 13:21:00.428874  info clickhouse connection open: tcp://localhost:9000 logger=clickhouse
2022/12/13 13:21:00.429022  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2022/12/13 13:21:00.430854  info SELECT * FROM system.disks; logger=clickhouse
2022/12/13 13:21:00.433589  info SELECT max(toInt64(bytes_on_disk * 1.02)) AS max_file_size FROM system.parts logger=clickhouse
2022/12/13 13:21:00.438443  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros' logger=clickhouse
2022/12/13 13:21:00.442009  info SELECT * FROM system.macros logger=clickhouse
2022/12/13 13:21:02.567258  info done                      backup=rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904 duration=997ms logger=backuper operation=download size=11.33KiB table_metadata=full_orderbooks_v1.snapshot_v1
2022/12/13 13:21:02.567444  info clickhouse connection closed logger=clickhouse
2022/12/13 13:21:02.567472 error one of Download go-routine return error: /var/lib/clickhouse/backup/rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904/shadow/full_orderbooks_v1/snapshot_v1/default/201904_0_137_12 not found after download backup

If I download the backup using rclone sync and then restore it, it works.

Slach commented 1 year ago

could you share

clickhouse-backup version
clickhouse-backup print-config

without sensitive credentials?

Slach commented 1 year ago

could you also share clickhouse-backup list remote ?

bmailhe commented 1 year ago

Version

clickhouse-backup --version
Version:         2.1.2
Git Commit:      376c8dd499c3f8c4a0f873670345c8b53ef5c44a
Build Date:      2022-10-23
Config ```` general: remote_storage: s3 max_file_size: 0 disable_progress_bar: true backups_to_keep_local: 0 backups_to_keep_remote: 0 log_level: info allow_empty_backups: false download_concurrency: 50 upload_concurrency: 50 restore_schema_on_cluster: "" upload_by_part: true download_by_part: true restore_database_mapping: {} retries_on_failure: 3 upload_retries_pause: 100ms watch_interval: 1h full_interval: 24h watch_backup_name_template: shard{shard}-{type}-{time:20060102150405} retriesduration: 100ms watchduration: 1h0m0s fullduration: 24h0m0s clickhouse: username: xxx password: "xxx" host: localhost port: 9000 disk_mapping: {} skip_tables: - system.* - default.* timeout: 5m freeze_by_part: false freeze_by_part_where: "" use_embedded_backup_restore: false embedded_backup_disk: "" secure: false skip_verify: false sync_replicated_tables: false log_sql_queries: true config_dir: /etc/clickhouse-server/ restart_command: systemctl restart clickhouse-server ignore_not_exists_error_during_freeze: true check_replicas_before_attach: true tls_key: "" tls_cert: "" tls_ca: "" debug: false s3: access_key: xxx secret_key: xxx bucket: xxx endpoint: xxx region: us-west-000 acl: "" assume_role_arn: "" force_path_style: false path: "" disable_ssl: false compression_level: 1 compression_format: tar sse: "" disable_cert_verification: false use_custom_storage_class: false storage_class: STANDARD concurrency: 100 part_size: 0 max_parts_count: 20000 allow_multipart_download: false debug: false gcs: credentials_file: "" credentials_json: "" credentials_json_encoded: "" bucket: "" path: "" compression_level: 1 compression_format: tar debug: false endpoint: "" storage_class: STANDARD cos: url: "" timeout: 2m secret_id: "" secret_key: "" path: "" compression_format: tar compression_level: 1 debug: false api: listen: localhost:7171 enable_metrics: true enable_pprof: false username: "" password: "" secure: false certificate_file: "" private_key_file: "" create_integration_tables: false integration_tables_host: "" allow_parallel: false ftp: address: "" timeout: 2m username: "" password: "" tls: false path: "" compression_format: tar compression_level: 1 concurrency: 24 debug: false sftp: address: "" port: 22 username: "" password: "" key: "" path: "" compression_format: tar compression_level: 1 concurrency: 1 debug: false azblob: endpoint_schema: https endpoint_suffix: core.windows.net account_name: "" account_key: "" sas: "" use_managed_identity: false container: "" path: "" compression_level: 1 compression_format: tar sse_key: "" buffer_size: 0 buffer_count: 3 max_parts_count: 10000 timeout: 15m custom: upload_command: "" download_command: "" list_command: "" delete_command: "" command_timeout: 4h commandtimeoutduration: 4h0m0s ````

List remote

$ clickhouse-backup list remote
G8tJ-FULL-full_orderbooks_v1-snapshot_v2-221205160449          3.12TiB     06/12/2022 09:47:14   remote                                                            , regular
rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904          7.72TiB     06/12/2022 15:56:23   remote                                                            , regular
rZhV-PARTIAL-001-full_orderbooks_v1-snapshot_v1-221212090143   111.82GiB   12/12/2022 09:20:37   remote   +rZhV-FULL-full_orderbooks_v1-snapshot_v1-221206130904   tar, regular
G8tJ-PARTIAL-001-full_orderbooks_v1-snapshot_v2-221212130038   3.12TiB     12/12/2022 13:01:52   remote   +G8tJ-FULL-full_orderbooks_v1-snapshot_v2-221205160449   tar, regular

FYI the "FULL" backups are uploaded using rclone, I think the problem comes from the fact some directories are not compressed with tar ?

Slach commented 1 year ago

FYI the "FULL" backups are uploaded using rclone, I think the problem comes from the fact some directories are not compressed with tar ?

yes, this is root reason

for compression_format: tar clickhouse-backup expect backup_name/db/table/disk_name/part_name.tar files

also during download expect "Files" section in backup_name/metadata/db/table_name.json which save only when you use clickhouse-backup

if you want improve speed for upload \ download to s3 and memory usage

you can integrate rclone with remote_storage: custom inside clickhouse-backup, but not sure how to create incremental backup for rclone

Moreover, if you'd like, you could integrate restic instead of rclone with remote_storage: custom

look for detail https://github.com/Altinity/clickhouse-backup/tree/master/test/integration/restic and https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/config-custom-restic.yml

bmailhe commented 1 year ago

Thanks for the explanation. I did more tests with upload_commandand download_command set with rclone and it works !

custom:
  upload_command: "rclone sync --fast-list --b2-disable-checksum "
  download_command: "rclone sync --fast-list --b2-disable-checksum "

then

# create and upload a backup
clickhouse-backup create full-backup
clickhouse-backup upload full-backup

# add data

# create a partial backup
clickhouse-backup create incr-backup
clickhouse-backup upload --diff-from-remote=full-backup incr-backup

# restore from remote
clickhouse-backup delete local full-backup
clickhouse-backup delete local incr-backup
clickhouse-backup restore_remote incr-backup --rm
Slach commented 1 year ago

check remote backup sizes via your S3 backed UI

not sure for --diff-from-remote will work properly for rclone wihtout changed "custom" section

moreover, if you will store local backups it will allocate disk space when you run mutation ALTER TABLE ... DELETE / UPDATE or after any background parts merge

Slach commented 1 year ago

fill free to make Pull Request with examples for rclone integration