garethgeorge / backrest

Backrest is a web UI and orchestrator for restic backup.
GNU General Public License v3.0
787 stars 31 forks source link

command output is not valid JSON: invalid character 'L' looking for beginning of value #338

Closed ovizii closed 1 day ago

ovizii commented 2 weeks ago

Describe the bug One of my three plans has started giving this error on its nightly run.

The recent changes made to backrest are these and I will revert them to see if it makes any difference.

To Reproduce Steps to reproduce the behaviour:

Not sure, error just popped up after months of perfectly working.

Expected behaviour I'd expect my plans to execute smoothly.

Platform Info

Additional context

Docker logs:

backrest  | 2024-06-14T03:00:01.334+0200        DEBUG   repo orchestrator starting backup       {"repo": "OneDrive1TB"}
backrest  | 2024-06-14T03:00:12.466+0200        ERROR   task failed     {"task": "backup for plan \"Documents\"", "error": "failed to get snapshots for plan: get snapshots for plan \"Documents\": command \"/bin/nice -n 10 ionice -c 2 -n 7 /bin/restic-0.16.4 snapshots --json -o sftp.args=-oBatchMode=yes --tag plan:Documents,created-by:pve02\" failed: command output is not valid JSON: invalid character 'L' looking for beginning of value\nOutput:\nLoad(<config/0000000000>, 0, 0) returned error, retrying after 394.755275ms: <config/0000000000> does not exist", "duration": "12.449950903s"}
backrest  | 2024-06-14T03:00:12.542+0200        INFO    scheduling task {"task": "backup for plan \"Documents\"", "runAt": "2024-06-15T03:00:00+02:00"}
backrest  | 2024-06-14T04:00:00.076+0200        INFO    running task    {"task": "backup for plan \"Pictures\""}

Plan logs:

13T03:00:07.104978857Z","parent":"78e443f501268d775cc98cae536ab7b8e9561b841571c763c8c9c32d81f61598","tree":"f33711667711f705ab3e1f7e36f9dd7893d946bce8d73407eb53861a52c3890f","paths":["/mnt/documents"],"hostname":"backrest","username":"root","tags":["plan:Documents","created-by:pve02"],"program_version":"restic 0.16.4","id":"0d0a48df30ef507c27944936ebb821a42c0fe3aa7826af474c22ed7090c37e6c","short_id":"0d0a48df"}]
command "/bin/nice -n 10 ionice -c 2 -n 7 /bin/restic-0.16.4 snapshots --json -o sftp.args=-oBatchMode=yes --tag plan:Documents,created-by:pve02" failed: command output is not valid JSON: invalid character 'L' looking for beginning of value
Output:
Load(<config/0000000000>, 0, 0) returned error, retrying after 394.755275ms: <config/0000000000> does not exist
ovizii commented 2 weeks ago

I reverted those changes I mentioned above and ran the backup manually. The backup succeeded but the forget part now fails:

backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       found legacy snapshots without instance ID, recommending legacy forget behavior.
backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       forgetting snapshots without instance ID, using legacy behavior (e.g. --tags not including instance ID)
backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       to avoid this warning, tag all snapshots with the instance ID e.g. by running:
backrest  | restic tag --set 'plan:Documents' --set 'created-by:pve02' --tag 'plan:Documents'
backrest  | 2024-06-14T10:30:26.449+0200        ERROR   task failed     {"task": "forget for plan \"OneDrive1TB\" in repo \"Documents\"", "error": "forget: get snapshots for repo OneDrive1TB: command \"/bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by \" failed: exit status 1", "duration": "2.716963852s"}
backrest  | 2024-06-14T10:30:26.449+0200        INFO    running task    {"task": "collect garbage"}
backrest  | 2024-06-14T10:30:26.475+0200        INFO    collecting garbage      {"operations_removed": 0}
garethgeorge commented 2 weeks ago

Hey, is the initial error you saw transient or reproducible? Looks like a storage layer error but could also be some interaction with the settings you're enabled (e.g. does it only show up when using nice / ionice and go away otherwise?).

ovizii commented 2 weeks ago

Not quite sure. I had reverted the changes mentioned above and then all 3 jobs with a cron schedfule failed with the same error:

command: /bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by 
unable to create lock in backend: repository is already locked by PID 20584 on backrest by root (UID 0, GID 0)
lock was created at 2024-06-14 10:27:41 (16h33m24.140672018s ago)
storage ID d7663af0
the `unlock` command can be used to remove stale locks
command "/bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by " failed: exit status 1

This is weird, I though backrest serializes plans. Also, I had never had to check the "auto unlock" option as backrest is the only instance which accesses this repo.

Running the three plans manually now. Will reply after I know more.

The prune and check of the repo also failed with the same "repo locked, remove lock" error.

ovizii commented 2 weeks ago

manual backups worked, manual prune and manual checks worked. Will close this issue if the cron schedule tomorrow does not yield anythign new.

garethgeorge commented 1 day ago

Hey, circling back on this -- any luck with use of auto unlock / over time? It may also be worth updating to 1.2.0 if you've not already as this fixed a rare deadlock which seemed to leave repos in bad lock states (in rare cases) -- should now be patched.

ovizii commented 1 day ago

Sorry about that, I totally forgot to reply and close this ticket.