Open SiddharthManthan opened 4 months ago
Does this repo setting not fit your use case? Setting this to 100% will re-download the whole thing ...
Does this repo setting not fit your use case? Setting this to 100% will re-download the whole thing ...
The repo is quite large. Downloading all the data will take a long time. It might not even be possible because the machine is not running 24*7.
hey, --read-data-subset=n/t
is only guaranteed to check the whole repo if you systematically increment n
which isn't something I expect to implement as that value would need to be tracked somewhere which adds complexity relative to the value of the feature. If you're using storage that ensures it's integrity e.g. S3 etc I think there's also very limited value to actually verifying data integrity by downloading it, this is supported primarily for users of local storage or sftp.
The alternative would be checking random chunks of the repo, but this likely isn't better than using a random percentage.
hey,
--read-data-subset=n/t
is only guaranteed to check the whole repo if you systematically incrementn
which isn't something I expect to implement as that value would need to be tracked somewhere which adds complexity relative to the value of the feature. If you're using storage that ensures it's integrity e.g. S3 etc I think there's also very limited value to actually verifying data integrity by downloading it, this is supported primarily for users of local storage or sftp.The alternative would be checking random chunks of the repo, but this likely isn't better than using a random percentage.
Check not only checks the integrity of repository files, but also checks for repository corruption. There have been many cases where the repository gets corrupted even though the blobs themselves are error free (due to bug in restic code or other reasons). So check is very useful for cloud storage, even if cloud storage ensures integrity. If corruption is detected before data needs to be restored, it can be fixed in timely manner. The feature does have value, even if it increases the complexity.
Check not only checks the integrity of repository files, but also checks for repository corruption.
That is true even in the current version of backrest. If you had set it to 0.001%
(0%
might work as well) it already checks the indexes, and stats the packs.
Check not only checks the integrity of repository files, but also checks for repository corruption.
That is true even in the current version of backrest. If you had set it to
0.001%
(0%
might work as well) it already checks the indexes, and stats the packs.
But it will miss the data blobs with 0.001 %. Only by reading the data, we can detect errors.
Here is an example of data corruption. All files in repo (index, data blobs, etc) appear fine and checksum correctly but the actual data (after restore) will have different checksum. There have been such bugs in the past (look in forums).
These corruption can only be detected by full data read.
This is continuation of #75.
Problem
Solution
--read-data-subset=n/t
(Documentation).I have filed it as bug report because the UI currently does not mention that all data is not checked. There should be a warning or proper check support should be implemented.