djmaze / resticker

Run automatic restic backups via a Docker container.
https://hub.docker.com/r/mazzolino/restic/
Apache License 2.0
496 stars 67 forks source link

Fatal: unable to create lock in backend: repository is already locked by PID 55 on docker by root (UID 0, GID 0) #171

Open benjamin051000 opened 1 year ago

benjamin051000 commented 1 year ago

Running resticker latest, trying to backup both my docker volumes and a folder in my homedir to backblaze. I also use immich so I dump the immich db with the before command, and exclude some folders I don't want backed up.

docker-compose (running as a portainer stack):

version: "3.3"

services:
  backup:
    image: mazzolino/restic
    hostname: docker
    # restart: unless-stopped
    environment:
      RUN_ON_STARTUP: "true"
      BACKUP_CRON: "0 0 3 * * *"
      RESTIC_REPOSITORY: b2:hpmelab-backup:/restic-repo
      RESTIC_PASSWORD: [redacted]
      RESTIC_BACKUP_SOURCES: /backup/
      RESTIC_BACKUP_ARGS: --tag docker-volumes --exclude "/backup/services/immich/backup/encoded-video" --exclude "/backup/services/immich/backup/thumbs"
      RESTIC_FORGET_ARGS: --keep-daily 7 --keep-weekly 4
      B2_ACCOUNT_ID: [redacted]
      B2_ACCOUNT_KEY: [redacted]
      PRE_COMMANDS: |-
        docker exec -t immich_postgres pg_dumpall -c -U postgres | gzip > "/backup/services/immich/db_dumps/dump.sql.gz"

      TZ: America/Chicago
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # for pre commands
      - /var/lib/docker/volumes:/backup/volumes:ro
      - /home/bw/services:/backup/services

  check:
    image: mazzolino/restic
    hostname: docker
    # restart: unless-stopped
    environment:
      RUN_ON_STARTUP: "false"
      CHECK_CRON: "0 0 4 * * *"  # 1hr after resticker backup/upload
      RESTIC_CHECK_ARGS: >-
        --read-data-subset=10%
      RESTIC_REPOSITORY: b2:hpmelab-backup:/restic-repo
      RESTIC_PASSWORD: [redacted]
      B2_ACCOUNT_ID: [redacted]
      B2_ACCOUNT_KEY: [redacted]
      TZ: America/Chicago

log:

Checking configured repository 'b2:hpmelab-backup:/restic-repo' ...
Repository found.
Executing backup on startup ...
docker exec -t immich_postgres pg_dumpall -c -U postgres | gzip > "/backup/services/immich/db_dumps/dump.sql.gz"
Starting Backup at 2023-08-08 23:36:02
no parent snapshot found, will read all files
Files:       43987 new,     0 changed,     0 unmodified
Dirs:         5203 new,     0 changed,     0 unmodified
Added to the repository: 30.791 GiB (29.821 GiB stored)
processed 43987 files, 33.560 GiB in 17:24
snapshot 5ebdeacd saved
Backup successful
Forget about old snapshots based on RESTIC_FORGET_ARGS = --keep-daily 7 --keep-weekly 4
Fatal: unable to create lock in backend: repository is already locked by PID 55 on docker by root (UID 0, GID 0)
lock was created at 2023-08-08 23:35:05 (18m25.400514635s ago)
storage ID f8c80095

Any ideas on why it fails after backup is complete? I even see the repo in backblaze. It seems like the step it's failing on is scheduling the cron task. Any help would be greatly appreciated, thanks!

benjamin051000 commented 1 year ago

Looks like this is the spot where it's failing, line 94:

https://github.com/djmaze/resticker/blob/65e361de864f11ffe353dbd6aa1d253eb8ecfb7b/backup#L93-L94

benjamin051000 commented 1 year ago

1.6.0 doesn't appear to have this issue and makes it past the forget flag. I'll stay on that version until I hear about a fix

benjamin051000 commented 1 year ago

https://github.com/restic/restic/issues/3491 may be related?

razaqq commented 12 months ago

Same issue here, no idea how to downgrade tho once the repo is in the newer state

djmaze commented 11 months ago

Not using b2 myself. Anyone wants to try out the mazzolino/restic:latest which now contains Restic 0.16.0?

littlegraycells commented 7 months ago

Has there been any progress on this issue? Checking my restic logs today, I realized my backups haven't been working for several weeks and found this error in my logs. Following the comment above, I pulled version 1.7.1 and this error seems to still be there.

Forget about old snapshots based on RESTIC_FORGET_ARGS = --keep-last 10 --keep-daily 7 --keep-weekly 5 --keep-monthly 12

repo already locked, waiting up to 0s for the lock

unable to create lock in backend: repository is already locked by PID 29 on restic_server by root (UID 0, GID 0)

lock was created at 2023-12-07 01:25:45 (922h55m39.463073982s ago)

storage ID 08cb065a

the `unlock` command can be used to remove stale locks

Based on restic/restic#2736, it appears that the current guidance is to basically use the unlock command before running other commands.

djmaze commented 7 months ago

@littlegraycells Yes, manually unlocking is still the suggested advice.

To be more precise, I would suggest to always have a monitoring solution for backups (which you do not seem to have). For example by sending emails in error cases (as shown in the documentation). Or, even better, using something like Healthchecks in order to make sure failures are not being missed. (I can warmly recommend the latter one, you can also host it yourself!)

For me, having about 15 different servers, this procedure works very well.

That said, I can see that an auto-unlock solution as proposed in the restic issue could work. But that should be implemented there.

littlegraycells commented 7 months ago

@djmaze Thanks. I do run a self-hosted version of healthchecks.io currently.

Would you recommend running the unlock command with PRE_COMMANDS in the backup container?

ThomDietrich commented 7 months ago

Hey @djmaze, that's an interesting comment. I use Uptime Kuma to observer services directly, as well as containers an their healthcheck status. I did not kno that healthchecks can be self-hosted!

Irrespective, what I found challenging with resticker is:

Do you have a solution to this? Cheers!

djmaze commented 7 months ago
  • I do not want a notification if there is just a one-time sync issue. That can happen and is not a problem.

That's what POST_COMMANDS_INCOMPLETE is for. (Tbh, I personally do not (yet) use it because I have very few failures and it does not bug me.)

I want a warning after x consecutive unsuccessful backup attempts. I want a waning if the backup did not run for x days.

Mhh.. We could implement this in resticker. But if you are using Healthchecks, you could also solve it by just pinging Healthchecks using POST_COMMANDS_SUCCESS and then set the healthcheck grace period to x days. So Healthchecks will notify when the grace period has been exceeded.

djmaze commented 7 months ago

Would you recommend running the unlock command with PRE_COMMANDS in the backup container?

If you have only one host using the repository, this might make sense, but if there is more than one (as is the case e.g. when running prunes on a bigger server, like I do) in my opinion that is too dangerous.

(I could agree with a solution which automatically removes locks that are e.g. > 24 hours old. But as I said I would prefer this to be solved upstream.)

razaqq commented 7 months ago

Well currently resticker is completely unusable for many people, because of the issue detailed above. Every time it tries to backup, it goes into an infinite loop trying to lock the repo

djmaze commented 7 months ago

@razaqq Well, afaics there is still no reproducible test case.

As another workaround, you could also remove RESTIC_FORGET_ARGS and run the forget manually at times.

thierrybla commented 4 months ago

I am also running into this issue.

pre_commands with restic unlock also doesn't seem to work for me.

If I unlock the repository from another machine it starts backing up again for a while only to get locked again. See: 2024-05-02 13:58:07.091559+02:00 Checking configured repository 'rclone:google-drive:backups/restic' ... 2024-05-02 13:58:12.984656+02:00 unable to create lock in backend: repository is already locked exclusively by PID 1425 on restic-backup-custom-app-859c787754-w8n9n by root (UID 0, GID 0) 2024-05-02 13:58:12.984702+02:00lock was created at 2024-04-10 05:15:04 (536h43m8.169813168s ago) 2024-05-02 13:58:12.984710+02:00 storage ID de9659b9 2024-05-02 13:58:12.984716+02:00 the unlock command can be used to remove stale locks 2024-05-02 13:58:12.984741+02:00 Could not access the configured repository. 2024-05-02 13:58:12.984748+02:00 Trying to initialize (in case it has not been initialized yet) ... 2024-05-02 13:58:14.908435+02:00 Fatal: create repository at rclone:google-drive:backups/restic failed: config file already exists 2024-05-02 13:58:14.908483+02:00 2024-05-02T13:58:14.908483735+02:00 2024-05-02 13:58:14.908621+02:00 Initialization failed. Please see error messages above and check your configuration. Exiting.

djmaze commented 3 months ago

@thierrybla It would help if you could the original container / job that the lock came from. In your example the lock is quite old, maybe it was a prune which did not finish (because of lack of memory or similar)?

thierrybla commented 3 months ago

@thierrybla It would help if you could the original container / job that the lock came from. In your example the lock is quite old, maybe it was a prune which did not finish (because of lack of memory or similar)?

It should not be lack of memory I am running 128gb of RAM but its not near full at all time.