YunoHost-Apps / borg_ynh

An experimental Borg implementation for YunoHost
https://www.borgbackup.org/
GNU Affero General Public License v3.0
19 stars 22 forks source link

[Enhancement] Borg keep processing unchanged files as new files #138

Open Vitriia opened 1 year ago

Vitriia commented 1 year ago

Describe the bug

There is no problem with the backed up data and they are normally deduplicated.

But if your backups are taking way longer than normally expected with borg it seems to be normal

To backup files yunohost create a temporary folder with a random name. But Borg use absolute pathname to store it's files cache. So if for each backup yunohost create a new folder, borg won't have this pathname in it's files cache and recalculate the chunk for each files. https://borgbackup.readthedocs.io/en/stable/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file

For exemple, one of my app have 107k files for 14Go. It was always taking 1h30 to backup with 'yunohost backup create'. But with 3 manual borg create it's takin 1h30 for the first and 10min for each subsequent.

Steps to see if you have this issue too

In your '/etc/yunohost/hooks/backup_method/borgXXX' file comment

borg create "$repo::_${name}-${current_date}" ./ 2>&1 >/dev/null | log_with_timestamp

and add below the commented line

borg create --list --filter=AME --stats "$repo::_${name}-${current_date}" ./ 2>> /path/to/log/file | log_with_timestamp

For the next backups you will have your file populated by log of Added (considered as new by borg), Modified files or Errors.

The log file should show a bunch of lines beginning by 'A /path/to/files' If the nomber of line is equal to the number of files backed up : Bravo ! Borg in considering all your backed up files as new and calculate chunk for each.

Expected behaviour

If the log file show some lines beginning by A, M or E it's ok.

Also, the issue with it is borg will consume way more disk write & processor time than it should.

Context

Borg: 1.1.16~ynh29

yunohost: repo: stable version: 11.1.16 yunohost-admin: repo: stable version: 11.1.9.4 moulinette: repo: stable version: 11.1.4 ssowat: repo: stable version: 11.1.4

Vitriia commented 8 months ago

It's definitly caused by the Borg cache TTL. But there maybe is a solution : in a recent Borg update they added the BORG_FILES_CACHE_SUFFIX variable.

In short, to know if a file have been changed borg keep a file cache with timestamp, size and inode of files. This cache can't be infinite so there is a TTL. Each backup from same user & machine there is a counter incremented on each entry not modified and when it reach the TTL (default: 20) the entry are removed from the cache. You can modify the TTL value but it's not the best idea.

The solution can be to use the BORG_FILES_CACHE_SUFFIX variable. Instead of using the main cache file, you can setup multiple small cache files with this variable. My actual setup consist of a hook.d/backup_method/borg file modified with this variable added to the others variables. export BORG_FILES_CACHE_SUFFIX="$2" It get the name of the backup and use it as a name for the cache file.

The limitation is it's only viable if you don't use Borg as a Yunohost backup method manually. Because it will add a cache file for every backup names. An improvement can be to get the name of the backed up app and directly using it as the cache suffix or limiting it to automatic backups but i don't know how to do this.

Vitriia commented 8 months ago

And i closed this issue by error :")