Closed pgerber closed 3 years ago
Do you have scripts running that use borg break-lock
?
Do you have multiple borg client processes running against that one repo? At the same time?
Analyzing this:
So, as ELD is gone in 159 then something must have removed it (which is why I was asking for break-lock and other borg processes which maybe raced for deleting a stale lock).
Hmm, no matter how this happened, I think you have found a bug here.
borg should not crash when trying to look into a not-any-more-existing exclusive lock directory.
Note: master branch locking code did get recently get improvements/fixes by @alfredo08154711 and it already handles this case.
Hi,
here is alfredo08154711 writing. It's not clear to me whether there is some action required from my side.
BTW, it's also not clear to me into which versions my fixes have been merged. I only know that they have been merged into the master branch about one year ago.
And I'm still waiting on a discussion of my suggested contribution, option --shared for borg with-lock (a PR, IIRC), "Lock the repository with a shared (read) instead of an exclusive (write) lock." I've been using this for my repository rsync copying (redundancy for business continuity) for one year without any observed issues.
Cheers Thomas
Am 9. Juni 2021 14:15:07 MESZ, schrieb TW @.***>:
Note: master branch locking code did get recently get improvements/fixes by @alfredo08154711 and it already handles this case.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/borgbackup/borg/issues/5828#issuecomment-857643406
-- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
hi @alfredo08154711 - no action strictly required, maybe you want to have a look in the PR.
yes, your fixes are only in master branch (and thus in borg 1.2).
i first wanted to have that tested there before doing big changes in 1.1 locking. not sure how much testing the 1.2 betas got until now - a lot people might hold back and use a stable release only rather than additionally testing a beta.
about the with-lock --shared PR: will try to review it, but I'ld also like other users/developers to review that and give feedback.
do you mean #4120 ?
Vielen Dank, Herr Waldmann.
Eigentlich hätte ich Sie mal gern persönlich kennengelernt - als ich mit den Fixes begann, hatte ich noch einen Job bei AIRBUS in Ulm und pendelte daher wöchentlich durch Ihre Gegend (ich wohne 8km von Aschaffenburg entfernt Richtung Frankfurt). Aber wegen der Bundescorontäne habe ich von der Idee, Sie mal zu treffen, Abstand genommen.)
Aber was nicht ist, kann ja noch werden...
Viele Grüße Thomas Portmann
Am 9. Juni 2021 14:50:34 MESZ, schrieb TW @.***>:
hi @alfredo08154711 - yes, your fixes are only in master branch (and thus in borg 1.2).
i first wanted to have that tested there before doing big changes in 1.1 locking. not sure how much testing the 1.2 betas got until now - a lot people might hold back and use a stable release only rather than additionally testing a beta.
about the with-lock --shared PR: will try to review it, but I'ld also like other users/developers to review that and give feedback.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/borgbackup/borg/issues/5828#issuecomment-857666406
-- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Do you have scripts running that use borg break-lock?
No, break-lock is not used.
Do you have multiple borg client processes running against that one repo? At the same time?
Yes.
According to the crontab file two jobs are actually started simultaniously:
# create backup "main" at borg-backup01@backup02.tocco.ch:~
30 21 * * * root borg create --error --compression zstd --lock-wait 43200 --noatime --patterns-from /etc/borg/patterns-main -- 'borg-backup01@backup02.tocco.ch:~'::main-{now:\%Y-\%m-\%dT\%H:\%M:\%S}
# prune backup "main" at borg-backup01@backup02.tocco.ch:~
30 20 * * * root borg prune 'borg-backup01@backup02.tocco.ch:~' --lock-wait 43200 --keep-within 7d --keep-daily 7 --keep-weekly 4 --prefix main-
# create backup "postgres" at borg-backup01@backup02.tocco.ch:~
02 22 * * * root borg create --error --compression zstd --lock-wait 43200 --noatime --patterns-from /etc/borg/patterns-postgres -- 'borg-backup01@backup02.tocco.ch:~'::postgres-{now:\%Y-\%m-\%dT\%H:\%M:\%S}
# prune backup "postgres" at borg-backup01@backup02.tocco.ch:~
30 20 * * 1-5 root borg prune 'borg-backup01@backup02.tocco.ch:~' --lock-wait 43200 --keep-daily 7 --keep-weekly 4 --prefix postgres-
# check repo at borg-backup01@backup02.tocco.ch:~
00 23 * * 0 root borg check --lock-wait 43200 'borg-backup01@backup02.tocco.ch:~' >/dev/null 2>&1
# intended for use by monitoring and health checks
0 12 * * * root borg info --glob '*' --lock-wait 43200 --last 300 --json 'borg-backup01@backup02.tocco.ch:~' >/var/lib/tocco-borg/info-1.tmp && mv /var/lib/tocco-borg/info-1.tmp /var/lib/tocco-borg/info-1
I guess that could explain why it only happens with one server. I can't see any other simultaneous executions of borg.
Note: master branch locking code did get recently get improvements/fixes by @alfredo08154711 and it already handles this case.
I noticed the errors a while back and noticed that refactoring of the locking mechanism being ongoing. So, I refrained from reporting the issue under the assumption that I'd likely be fixed by the change. I assumed the change went into 1.1.16 but I guess was wrong about that.
I'm thinking I should just update the one server to borg 1.2 and see if that helps. Considering that only the secondary backup server is affected, the risk should be manageable.
OK, 1.1 fixed recently, master fixed since longer, all done here.
@pgerber 1.2 still in beta, be careful. testing is very welcome of course.
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
bug
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
Operating system (distribution) and version.
Hardware / network configuration, and filesystems used.
BTRFS on target host
Describe the problem you're observing.
I've been seeing locking-related errors for some time. See backtraces below. So far, I've only seen this issue on one out of two servers. The only difference I'm aware of is that the machine is using BTRFS as filesystem.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
No, but the issue reappears sporadically.
Include any warning/errors/backtraces from the system logs