Closed Apollon77 closed 3 years ago
If someone out there is willing to implement this feature properly and submit a patch, I will be happy to integrate it.
Some comments/guides on this (including the reasons why I personally did not miss the feature for more than 15 years - but, of course, opinions are always subjective...):
The feature would need to be optional (see 2.).
Calculating checksums of each file on each backup run has a very serious impact on performance. Caching of of checksums can partially help and is probably mandantory, but that would need to be implemented very carefully (considering unplanned interruptions, user-provided drivers etc. ...).
By the concept of hierarchically incremental backups, the waste of storage is limited to O(log N), where N is the number of backups. This mechanism alliviates the waste of storage capacity (though not the amount of write operations, which may be a problem if the backup device is an SSD).
I am not a big fan of storing Hashes for each file. Performance and storage questions arise...
Calculative example: The data storage on my primary server has about 600,000 files. With a sha256sum (64 hex digits) for each file + file path (assuming 36 chars per path for easier calc) it's 600,000 * 100 = 60,000,000
, around 60 Mb... These data has to be present while comparing.
The other way around would be to have the files named "better". It's like it is with log(-rotation)... Per default it only adds an incremental number to the files (my.log -> my.log.1 -> my.log.2.gz
), but you can configure it to add dates (my.log -> my.log.20181224 -> my.log.20181223.gz
), so it can be backed up easily without comparing file hashes...
What about doing the same for your db backup? You can run it at PRE_BACKUP
containing something like
oldpath=$(pwd)
cd /path/to/my/db/backup
rename 's/^\d+T\d+Z(.*)$/backup$1/' 2*
cd "$oldpath"
Just try it. Navigate to your backup an run
rename -n 's/^\d+T\d+Z(.*)$/backup$1/' 2*
The -n
option will just output without renaming anything. So you can check, if it's working out for you. I checked it against the file listing provided on this page and it works flowlessly.
Closing this for now, since the benefit of such an option is questionable, and no other comments/contributions came up in the past two years ...
Thanks @boppy for the help and explanation!
hey,
I use your backup scripton some machines, and also one with an InfluxDB and I want to backup the DB backup. When you create a DB backup for InfluxDB it generated many files with the data because InfluxDB uses "shards" (data junks with all e.g. 7 days) internally you get one file per shard.But they all get a new date when they are generated. But older shard-backup-files that were not changed have the same data in, but a new timestamp. Here it would be great to have the file content (means checksum) to be used to identify changes instead of the date o such.
Is there any chance to get this feature?