bit-team / backintime

Back In Time - An easy-to-use backup tool for GNU Linux using rsync in the back
https://backintime.readthedocs.io
GNU General Public License v2.0
2.05k stars 203 forks source link

Add option to retry incomplete backups automatically for selected errors #1591

Open aryoda opened 10 months ago

aryoda commented 10 months ago

Background

Errors when taking a snapshot may lead to incomplete snapshots (status shown as "WITH ERRORS !"),

image

There is currently no way to retry/continue the same snapshot which would be more efficient than starting a complete new backup again since only the failed files must be retried.

Since BiT release v1.4.0 the rsync exit codes are now also evaluated and logged (part of https://github.com/bit-team/backintime/issues/489) which improves the error recognition but also leads to more visible errors.

Before v1.4.0 errors that were silently ignored (not recognized) by BiT let the user uninformed about files not backed-up.

The user could recognize the not backed-up files only when explicitly looking into the full snapshot log details but the errors are not marked as [E] but [I] there because it requires to know and parse all possible rsync error messages - see #1587). Example:

[I] Take snapshot (rsync: symlink has no referent: "/home/user/Documents/dead-link")

Feature request

Introduce a new "retry feature" to BiT:

  1. Allow users to configure for which rsync exit codes an automatic retry shall be done (directly in the config file and via the GUI)
  2. Optional: The max number of retries shall be configurable (even though more than "1" would not make much sense except eg. in case of temporal connection/network issues)
  3. Optional: A retry waiting time in seconds shall be configurable (before the retry shall start). This value should not be allowed to be too high because it blocks (via lock files) the BiT process and also all other BiT process that start during that. I would suggest a maximum of 120 seconds.
  4. The retry shall try to transfer only the missing files (via suitable rsync options, probably --append and perhaps also --partial) to minimize the execution time and data transfer.
  5. The retry shall be logged into the same snapshot log file

See also

Follow up of #1573

aryoda commented 8 months ago

Open questions:

  1. Does the user callback logic require changes?
    • Call the "error" reason for each error even if a retry going to be started?
    • Do we need an additional argument to indicate the number of retries before the snapshot finally failed (or succeeded)?
jean-christophe-manciot commented 8 months ago

@aryoda

  • Call the "error" reason for each error even if a retry going to be started?

That's a fair question. IMHO, I would consider any snapshot as failed only if there is still an error at the last retry. So I would add another value to <backintime_status> ($3) passed to the user-callback as 'retry' (9) as long as it is not the last one, and pass the result of the latter as before.

  • Do we need an additional argument to indicate the number of retries before the snapshot finally failed (or succeeded)?

I don't think that would be necessary since I don't see what type of meaningful action the user could take.

However, I would add another parameter to indicate the <rsync_status> - instead of having to add some convoluted logic to get it in the user-callback which can be too complex for some users to implement. That means (unless I missed something):