Add option to retry incomplete backups automatically for selected errors

aryoda commented 10 months ago

Background

Errors when taking a snapshot may lead to incomplete snapshots (status shown as "WITH ERRORS !"),

There is currently no way to retry/continue the same snapshot which would be more efficient than starting a complete new backup again since only the failed files must be retried.

Since BiT release v1.4.0 the rsync exit codes are now also evaluated and logged (part of https://github.com/bit-team/backintime/issues/489) which improves the error recognition but also leads to more visible errors.

Before v1.4.0 errors that were silently ignored (not recognized) by BiT let the user uninformed about files not backed-up.

The user could recognize the not backed-up files only when explicitly looking into the full snapshot log details but the errors are not marked as [E] but [I] there because it requires to know and parse all possible rsync error messages - see #1587). Example:

[I] Take snapshot (rsync: symlink has no referent: "/home/user/Documents/dead-link")

Feature request

Introduce a new "retry feature" to BiT:

Allow users to configure for which rsync exit codes an automatic retry shall be done (directly in the config file and via the GUI)
Optional: The max number of retries shall be configurable (even though more than "1" would not make much sense except eg. in case of temporal connection/network issues)
Optional: A retry waiting time in seconds shall be configurable (before the retry shall start). This value should not be allowed to be too high because it blocks (via lock files) the BiT process and also all other BiT process that start during that. I would suggest a maximum of 120 seconds.
The retry shall try to transfer only the missing files (via suitable rsync options, probably --append and perhaps also --partial) to minimize the execution time and data transfer.
The retry shall be logged into the same snapshot log file

That's a fair question. IMHO, I would consider any snapshot as failed only if there is still an error at the last retry. So I would add another value to <backintime_status> ($3) passed to the user-callback as 'retry' (9) as long as it is not the last one, and pass the result of the latter as before.

Do we need an additional argument to indicate the number of retries before the snapshot finally failed (or succeeded)?

I don't think that would be necessary since I don't see what type of meaningful action the user could take.

However, I would add another parameter to indicate the <rsync_status> - instead of having to add some convoluted logic to get it in the user-callback which can be too complex for some users to implement. That means (unless I missed something):

$4 when $3=2
$6 when $3=4

bit-team / backintime

Add option to retry incomplete backups automatically for selected errors #1591

Background

Feature request

See also