Altinity / clickhouse-backup

Tool for easy ClickHouse backup and restore using object storage for backup files.
https://altinity.com
Other
1.19k stars 215 forks source link

`upload / download --resume --tables=t1,t2` after `upload / download --resume --tables=t1` don't processed t2 #840

Open amirshabanics opened 4 months ago

amirshabanics commented 4 months ago

Hi asume we have backup files a, b, c and we have tables z, x, y. now in step 1 i restore clickhouse with all three files for table z. then i want to do it for x and y too. this error throw:

warn 'base_backup_2024-02-15T00:30:01+03:30' doesn't contains tables for restore backup=base_backup_2024-02-15T00:30:01+03:30 operation=restore
amirshabanics commented 4 months ago

I understand what happens. the metadata of files that are stored in /backup didn't update itself. I have the full metadata of it and when I copy it in the directory and run restore with the --resume param, download the new table successfully.

Slach commented 4 months ago

Corry, i don't understand what is the issue? Could you provide full backup commands sequence?

Recently, i saw someone try to restore_remote --schema and after that try to restore_remote and first command download metadata/db/table.json with metadata_only:true which not allow download data files again...

Slach commented 4 months ago

any news from your side?

amirshabanics commented 4 months ago

Assume i run this command clickhouse-backup restore --tables='A,B' backup_file and then run this command again clickhouse-backup restore --tables='A,B,C,D' backup_file.

We know that backup_file has all A,B,C,D tables. but it raise error that the table doesn't exist. we must remove all data in /var/lib/clickhouse/backup and run command clickhouse-backup restore --resume --tables='A,B,C,D' backup_file to get the new tables.

Slach commented 4 months ago

Are you sure you are use restore command and not use restore_remote? --resume option is not present in restore command

amirshabanics commented 4 months ago

oh sorry you are right, this is simple code I run:

tables=(A B C D)
tables_with_comma=$(echo "${tables[@]}" | tr '[ ]' ',')

clickhouse-backup list > /tmp/backup_list
BASE=$(cat /tmp/backup_list | grep remote | awk '{print $1}' | grep base | sort | tail -n 1)
clickhouse-backup download --resume --tables=$tables_with_comma "$BASE"
for table in ${tables[@]}; do
  echo "----- Restoring table: ${table}"
  clickhouse-backup restore --drop -t "${table}" "$BASE"
done
amirshabanics commented 4 months ago
tables=(A B C D)

when you add one more table to this, the restore command doesn't work well. it raise error that the table doesn't exist in backup file.

Slach commented 4 months ago

first execution of download --resume will store download state into /var/lib/clickhouse/backup/backup_name/download.state

second execution when you change tables= will find backup_name/metadata.json in /var/lib/clickhouse/backup/backup_name/download.state and decide whole backup already downloaded

you could just remove /var/lib/clickhouse/backup/$BASE/upload.state if you change tables=(A B C) to tables=(A B C D)

will try to fix it

amirshabanics commented 4 months ago

thanks. if it needs just a simple fix, I can do it?

Slach commented 4 months ago

not sure it will a simple fix from our side we need to implement different format for resumableState files first https://github.com/Altinity/clickhouse-backup/issues/828

amirshabanics commented 4 months ago

why do we need to add --resume? it can't check itself whether need to download or not?

Slach commented 4 months ago

your "$BASE" backup just not changed after previous download and contains download.state file and /var/lib/clickhouse/backup/backup_name/metadata.json which contains only A B C tables

I don't see your whole workflow, and don't know your goals, you shared it partially so i can't suggest to you properly command sequence

I don't know, why do you need --resume, but without --resume your download for backup which already exists will fail.

--resume turned on by default after 2.2.0 with config options USE_RESUMABLE_STATE

general:
  use_resumable_state: true
amirshabanics commented 4 months ago

this is my whole code but it is simple. I have another clickhouse in another server and i backup it to s3. and then i restore it to the clickhouse every 2 hour. the clickhouse in another server backup all tables but my clickhouse need just some of them and may (business reason) need different tables. if i don't pass --resume param it doesn't check that it download all tables for a backup file. when i remove metadata and download state and then run it with--resume param it download all remain tables.

Slach commented 4 months ago

in this case you can try to use

USE_RESUMABLE_STATE=0 clickhouse-backup download --schema --tables=$tables_with_comma "$BASE"
clickhouse-backup download --resume --tables=$tables_with_comma "$BASE"

first execution will download $BASE/metadata folder second execution will download $BASE/metadata + $BASE/shadow and will use download.state

amirshabanics commented 4 months ago

ok but when i change tables_with_comma this is correct or not?

Slach commented 4 months ago

=( if $BASE is not changed from previous run, then it will fail, i try to fix it, wait when 2.5.0 will released

amirshabanics commented 4 months ago

can i somehow run it until you fix it? may be by deleting download.state and metadata.json?

Slach commented 4 months ago

@amirshabanics wait when 2.5.0 will released, after it your use case will work