command output is not valid JSON: invalid character 'L' looking for beginning of value

ovizii commented 2 weeks ago

Describe the bug One of my three plans has started giving this error on its nightly run.

The recent changes made to backrest are these and I will revert them to see if it makes any difference.

I changed the scheduling method from cron to time based and back to cron and added a few hooks to trigger healthcheck as a monitoring tool.
I changed "command modifiers" to IO_BEST _EFFORT_LOW and CPU Priority to CPU_LOW.
I ticked the "autounlock" option of my repository

To Reproduce Steps to reproduce the behaviour:

Not sure, error just popped up after months of perfectly working.

Expected behaviour I'd expect my plans to execute smoothly.

Platform Info

OS and Architecture Debian 12 with Docker
Backrest Version 1.1.0

Additional context

Docker logs:

backrest  | 2024-06-14T03:00:01.334+0200        DEBUG   repo orchestrator starting backup       {"repo": "OneDrive1TB"}
backrest  | 2024-06-14T03:00:12.466+0200        ERROR   task failed     {"task": "backup for plan \"Documents\"", "error": "failed to get snapshots for plan: get snapshots for plan \"Documents\": command \"/bin/nice -n 10 ionice -c 2 -n 7 /bin/restic-0.16.4 snapshots --json -o sftp.args=-oBatchMode=yes --tag plan:Documents,created-by:pve02\" failed: command output is not valid JSON: invalid character 'L' looking for beginning of value\nOutput:\nLoad(<config/0000000000>, 0, 0) returned error, retrying after 394.755275ms: <config/0000000000> does not exist", "duration": "12.449950903s"}
backrest  | 2024-06-14T03:00:12.542+0200        INFO    scheduling task {"task": "backup for plan \"Documents\"", "runAt": "2024-06-15T03:00:00+02:00"}
backrest  | 2024-06-14T04:00:00.076+0200        INFO    running task    {"task": "backup for plan \"Pictures\""}

Plan logs:

13T03:00:07.104978857Z","parent":"78e443f501268d775cc98cae536ab7b8e9561b841571c763c8c9c32d81f61598","tree":"f33711667711f705ab3e1f7e36f9dd7893d946bce8d73407eb53861a52c3890f","paths":["/mnt/documents"],"hostname":"backrest","username":"root","tags":["plan:Documents","created-by:pve02"],"program_version":"restic 0.16.4","id":"0d0a48df30ef507c27944936ebb821a42c0fe3aa7826af474c22ed7090c37e6c","short_id":"0d0a48df"}]
command "/bin/nice -n 10 ionice -c 2 -n 7 /bin/restic-0.16.4 snapshots --json -o sftp.args=-oBatchMode=yes --tag plan:Documents,created-by:pve02" failed: command output is not valid JSON: invalid character 'L' looking for beginning of value
Output:
Load(<config/0000000000>, 0, 0) returned error, retrying after 394.755275ms: <config/0000000000> does not exist

ovizii commented 2 weeks ago

I reverted those changes I mentioned above and ran the backup manually. The backup succeeded but the forget part now fails:

backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       found legacy snapshots without instance ID, recommending legacy forget behavior.
backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       forgetting snapshots without instance ID, using legacy behavior (e.g. --tags not including instance ID)
backrest  | 2024-06-14T10:30:23.750+0200        WARN    forget for plan "OneDrive1TB" in repo "Documents"       to avoid this warning, tag all snapshots with the instance ID e.g. by running:
backrest  | restic tag --set 'plan:Documents' --set 'created-by:pve02' --tag 'plan:Documents'
backrest  | 2024-06-14T10:30:26.449+0200        ERROR   task failed     {"task": "forget for plan \"OneDrive1TB\" in repo \"Documents\"", "error": "forget: get snapshots for repo OneDrive1TB: command \"/bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by \" failed: exit status 1", "duration": "2.716963852s"}
backrest  | 2024-06-14T10:30:26.449+0200        INFO    running task    {"task": "collect garbage"}
backrest  | 2024-06-14T10:30:26.475+0200        INFO    collecting garbage      {"operations_removed": 0}

garethgeorge commented 2 weeks ago

Hey, is the initial error you saw transient or reproducible? Looks like a storage layer error but could also be some interaction with the settings you're enabled (e.g. does it only show up when using nice / ionice and go away otherwise?).

ovizii commented 2 weeks ago

Not quite sure. I had reverted the changes mentioned above and then all 3 jobs with a cron schedfule failed with the same error:

command: /bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by 
unable to create lock in backend: repository is already locked by PID 20584 on backrest by root (UID 0, GID 0)
lock was created at 2024-06-14 10:27:41 (16h33m24.140672018s ago)
storage ID d7663af0
the `unlock` command can be used to remove stale locks
command "/bin/restic-0.16.4 forget --json --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --keep-monthly 3 --keep-yearly 5 -o sftp.args=-oBatchMode=yes --tag plan:Documents --group-by " failed: exit status 1

This is weird, I though backrest serializes plans. Also, I had never had to check the "auto unlock" option as backrest is the only instance which accesses this repo.

Running the three plans manually now. Will reply after I know more.

The prune and check of the repo also failed with the same "repo locked, remove lock" error.

ovizii commented 2 weeks ago

manual backups worked, manual prune and manual checks worked. Will close this issue if the cron schedule tomorrow does not yield anythign new.

garethgeorge commented 1 day ago

Hey, circling back on this -- any luck with use of auto unlock / over time? It may also be worth updating to 1.2.0 if you've not already as this fixed a rare deadlock which seemed to leave repos in bad lock states (in rare cases) -- should now be patched.

ovizii commented 1 day ago

Sorry about that, I totally forgot to reply and close this ticket.

I updated to backrest 1.2.0 as soon as it was available
"auto unlock" checkbox is ticked
prune and check policy is back to cron
I removed all command modifiers Backups have been running successfully for the past 2 weeks.

garethgeorge / backrest

command output is not valid JSON: invalid character 'L' looking for beginning of value #338