garethgeorge / backrest

Backrest is a web UI and orchestrator for restic backup.
GNU General Public License v3.0
1.16k stars 37 forks source link

On Error hook not triggered by warnings #254

Closed modem closed 4 months ago

modem commented 5 months ago

I've set a hook for Discord notification in case of error in a plan (and repo), but it seems it's not being triggered. image

In the log, I don't see any reference to the hook when an error occurs. In these last days, the plan failed 2 times due to an error accessing some files.

Here's the log of today's error:

2024-04-29T11:26:32.044+0100    DEBUG   got snapshots for plan  {"repo": "Backup_Config", "repo": "Backup_Config", "count": 7, "plan": "Config_Data", "tag": "plan:Config_Data"}
2024-04-29T11:26:30.335+0100    INFO    Starting backup {"plan": "Config_Data", "opId": 1797572856140269327}
2024-04-29T11:26:30.335+0100    DEBUG   repo orchestrator starting backup   {"repo": "Backup_Config"}
2024-04-29T11:49:53.187+0100    WARN    an unknown error was encountered in processing item: /raid/Config_Data/prowdata/config/prowdata.db-shm
2024-04-29T11:49:53.187+0100    WARN    an unknown error was encountered in processing item: /raid/Config_Data/prowdata/config/prowdata.db-wal
2024-04-29T11:50:34.655+0100    DEBUG   wrote operation log to 2024-04-29-logs.tar/18432
2024-04-29T11:50:34.787+0100    INFO    task finished   {"task": "backup for plan \"Config_Data\"", "duration": "24m4.636364665s"}
2024-04-29T11:50:34.907+0100    INFO    scheduling task {"task": "backup for plan \"Config_Data\"", "runAt": "2024-04-30T11:00:00+01:00"}
2024-04-29T11:50:34.907+0100    INFO    running task    {"task": "forget for plan \"Config_Data\""}
2024-04-29T11:50:36.309+0100    DEBUG   forget snapshots    {"plan": "Config_Data", "count": 1, "policy": "policy_keep_last_n:7"}
2024-04-29T11:50:37.068+0100    DEBUG   indexed known snapshot IDs for repo Backup_Config in 585.332µs
2024-04-29T11:50:36.369+0100    INFO    task finished   {"task": "forget for plan \"Config_Data\"", "duration": "1.462489529s"}
2024-04-29T11:50:36.361+0100    DEBUG   wrote operation log to 2024-04-29-logs.tar/21504
2024-04-29T11:50:34.319+0100    INFO    Backup complete {"plan": "Config_Data", "duration": "24m3.984502802s", "summary": "summary:{files_new:497  files_changed:892  files_unmodified:148552  dirs_new:61  dirs_changed:443  dirs_unmodified:219431  data_blobs:2737  tree_blobs:475  data_added:1688255741  total_files_processed:149941  total_bytes_processed:24676371628  total_duration:1441.720401441  snapshot_id:\"44c63b58cf348be66abcf40be3fbd6f6c4a514eb730344cb6832eca2828da5bd\"}"}
2024-04-29T11:50:37.084+0100    INFO    task finished   {"task": "index snapshots for plan \"Backup_Config\"", "duration": "714.210209ms"}
2024-04-29T11:50:34.555+0100    INFO    scheduling task {"task": "index snapshots for plan \"Backup_Config\"", "runAt": "2024-04-29T11:50:34+01:00"}
2024-04-29T11:50:34.555+0100    INFO    scheduling task {"task": "forget for plan \"Config_Data\"", "runAt": "2024-04-29T11:50:34+01:00"}
2024-04-29T11:50:36.369+0100    INFO    running task    {"task": "index snapshots for plan \"Backup_Config\""}
2024-04-29T11:50:37.083+0100    DEBUG   indexed snapshots   {"repo": "Backup_Config", "duration": "14.993076ms", "alreadyIndexed": 6, "newlyAdded": 1, "markedForgotten": 0}

I just edited the plan to trigger the same notification in case of a plan start and end, just to test the notification connection, and it got triggered at the backup start and at the backup end.

2024-04-29T13:27:01.908+0100    INFO    Starting backup {"plan": "Config_Data", "opId": 1797672006457493292}
2024-04-29T13:27:01.908+0100    INFO    running hook    {"plan": "Config_Data", "opId": 0, "hook": "plan/Config_Data/hook/0"}
2024-04-29T13:27:03.166+0100    DEBUG   repo orchestrator starting backup   {"repo": "Backup_Config"}
2024-04-29T13:27:23.028+0100    DEBUG   got snapshots for plan  {"repo": "Backup_Config", "repo": "Backup_Config", "count": 7, "plan": "Config_Data", "tag": "plan:Config_Data"}
2024-04-29T13:38:43.666+0100    DEBUG   Backup completed    {"repo": "Backup_Config", "repo": "Backup_Config", "duration": "11m20.637998906s"}
2024-04-29T13:38:43.666+0100    INFO    running hook    {"plan": "Config_Data", "opId": 0, "hook": "plan/Config_Data/hook/0"}
2024-04-29T13:38:44.575+0100    INFO    Backup complete {"plan": "Config_Data", "duration": "11m42.666992214s", "summary": "summary:{files_new:9  files_changed:625  files_unmodified:149314  dirs_changed:62  dirs_unmodified:219873  data_blobs:498  tree_blobs:62  data_added:356318587  total_files_processed:149948  total_bytes_processed:24669174726  total_duration:680.116178011  snapshot_id:\"24e68206cf0c180114cfe35f057b0de3cf11363fd1a71a993f493cb78ec59d41\"}"}

Expected behavior I expect the trigger to occur in case of a backup failure. This trigger has already worked in the past.

Platform Info

garethgeorge commented 5 months ago

Hey, forgot to leave a comment on this thread. tl;dr is that warnings don't trigger error hooks at the moment. I plan to add a CONDITION_SNAPSHOT_WARNING to capture these cases.

modem commented 5 months ago

These warnings already triggered the Discord hook with the error conditions. This example was from April 4th: image

Maybe something changed in these latest versions. The last trigger for me was on April 10th, so if there was any change in Backrest it was after this date. I don't have these warnings all the time, they are sporadic.

For me it's ok to capture these errors with the existing error conditions, but having a new warning condition will be fine as well. This latest may be useful for other people.

garethgeorge commented 4 months ago

Hey, you're right that there was a behavior change here -- which isn't great. I did some work prior to the last release to downgrade warnings to not trigger error hooks but you're right in noticing that there currently isn't a good way to get notified of IO errors.

This will be fixed in the v1.0.0 release by adding the warning condition hook (I do want users to have the option to ignore warnings if they see them commonly -- e.g. backing up big directories of constantly changing sqlite databases (if a user isn't inclined to setup hooks to properly stop the applications doing writes).

Specifically addressed in https://github.com/garethgeorge/backrest/commit/f0ee20f53de58e0a0a0a63137e203161d8acce4d

modem commented 4 months ago

Thanks for sharing your point of view. I can wait for the next release to have the warning condition. Is there a way we can stop a docker container from within Backrest running on another docker container?

garethgeorge commented 4 months ago

Added the new backup warning hook type in 1.0.0 which should address this