borgmatic-collective / docker-borgmatic

Borgmatic in Docker
GNU General Public License v3.0
347 stars 92 forks source link

Cron jobs creating broken locks with `flock` #334

Closed ericswpark closed 3 months ago

ericswpark commented 3 months ago

I'm running into a weird issue, potentially one-off, where flock invoked by supercronic (s6?) within the Docker image causes a broken lock that future invocations ignore. This leads to a bunch of logspam as borgmatic tries to back up, sees that borg's lockfile is still being held, and sends off a failure email.

I've tried using flock directly within the container and it works without any problems:

$ flock -n /tmp/test.lock sleep 1000

# In a separate shell (within the same container)
$ flock -n /tmp/test.lock echo "You should not see this message."
$ 

However, with the broken lockfile, it continues anyway:

$ flock -n /tmp/borgmatic.lock echo "You should not see this message."
You should not see this message.
$ 

I've verified that the file exists and has the same permissions and ownership:

67d8a628aaa6:/tmp# ls -al
total 652
drwxrwxrwt    1 root     root           308 Jun 17 21:51 .
drwxr-xr-x    1 root     root           228 Jun 17 09:47 ..
-rw-r--r--    1 root     root             0 Jun 17 09:48 borgmatic.lock
(...)
-rw-r--r--    1 root     root             0 Jun 17 21:51 test.lock
(...)

This is my crontab.txt:

# Every hour, run a create and prune
@hourly flock -n /tmp/borgmatic.lockfile borgmatic prune -v 1 --stats --list 2>&1 && borgmatic create -v 1 --stats --list 2>&1

# Every week, run a check
@weekly flock -n /tmp/borgmatic.lockfile borgmatic check -v 1 2>&1

# Every month, run a compact (delete old chunks)
@monthly flock -n /tmp/borgmatic.lockfile borgmatic compact -v 1 2>&1

Any ideas on why this wouldn't work? I recall it working fine previously.

ericswpark commented 3 months ago

I've just realized that the image updated to patch out supercronic, so maybe that's the cause -- I haven't been keeping up to date with the changes to the Borgmatic Docker image. Do I need to update my crontab configuration or is it a direct transplant?

Although I don't understand why s6 would be ignoring the flock part of the command. Surely it would error out rather than execute borgmatic.

ericswpark commented 3 months ago

Figured out the issue, see: https://stackoverflow.com/a/69501950

On the hourly line:

@hourly flock -n /tmp/borgmatic.lockfile borgmatic prune -v 1 --stats --list 2>&1 && borgmatic create -v 1 --stats --list 2>&1

Notice the && between the prune and create operations. flock only guards the lockfile for the first operation, and then releases the lock for the creation, causing future invocations to go through.

The fix:

@hourly flock -n /tmp/borgmatic.lockfile -c 'borgmatic prune -v 1 --stats --list 2>&1 && borgmatic create -v 1 --stats --list 2>&1'

Closing as user error, sorry!