borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.01k stars 738 forks source link

Configure Timeout of Lock acquire #8093

Closed marcohald closed 2 months ago

marcohald commented 7 months ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

QUESTION

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.7

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

Sometimes some hosts fail with the error borg.locking.LockTimeout: Failed to create/acquire the lock [REPO]/lock.exclusive (timeout). I don't know if the Server which is hosting the Backuprepos is the Problem or not. How long is this Timeout and is it configurable?

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Not really

Include any warning/errors/backtraces from the system logs

using builtin fallback logging configuration 33 self tests completed in 0.14 seconds SSH command line: ['ssh', 'backupuser@', 'borg', 'serve', '--debug'] Remote: using builtin fallback logging configuration Remote: 33 self tests completed in 0.09 seconds Remote: using builtin fallback logging configuration Remote: Initialized logging system for JSON-based protocol Remote: Resolving repository path b'/[REPO]' Remote: Resolved repository path to '/[REPO]' Remote: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/borg/remote.py", line 240, in serve res = f(**args) File "/usr/lib/python3/dist-packages/borg/remote.py", line 368, in open self.repository.enter() # clean exit handled by serve() method File "/usr/lib/python3/dist-packages/borg/repository.py", line 203, in enter self.open(self.path, bool(self.exclusive), lock_wait=self.lock_wait, lock=self.do_lock) File "/usr/lib/python3/dist-packages/borg/repository.py", line 435, in open self.lock = Lock(os.path.join(path, 'lock'), exclusive, timeout=lock_wait).acquire() File "/usr/lib/python3/dist-packages/borg/locking.py", line 389, in acquire self._wait_for_readers_finishing(remove, sleep) File "/usr/lib/python3/dist-packages/borg/locking.py", line 402, in _wait_for_readers_finishing self._lock.acquire() File "/usr/lib/python3/dist-packages/borg/locking.py", line 148, in acquire raise LockTimeout(self.path) from None borg.locking.LockTimeout: Failed to create/acquire the lock /[REPO]/lock.exclusive (timeout). Failed to create/acquire the lock /[REPO]/lock.exclusive (timeout). Borg server: Platform: Linux [BACKUPSERVER] 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 Borg server: Linux: Unknown Linux Borg server: Borg: 1.2.7 Python: CPython 3.9.2 msgpack: 1.0.0 fuse: pyfuse3 3.2.0 [pyfuse3,llfuse] Borg server: PID: 2477827 CWD: /home/backupuser Borg server: sys.argv: ['/usr/bin/borg', 'serve', '--restrict-to-path', '/[REPO]'] Borg server: SSH_ORIGINAL_COMMAND: 'borg serve --debug' Platform: Linux 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 Linux: Unknown Linux Borg: 1.2.7 Python: CPython 3.9.18 msgpack: 1.0.7 fuse: llfuse 1.5.0 [pyfuse3,llfuse] PID: 1659216 CWD: /root sys.argv: ['borg', 'create', '--exclude-caches', '--stats', '--debug', '--show-rc', 'ssh://backupuser@/[REPO]::{hostname}-{now:%Y-%m-%d-%H%M%S}', '/etc', '/root/.borgmatic', '/srv/mysqldumps/nightly', '/srv/www'] SSH_ORIGINAL_COMMAND: None

ThomasWaldmann commented 7 months ago

Check the docs about --lock-wait:

https://borgbackup.readthedocs.io/en/stable/usage/general.html

The default wait time is rather short (1s).

If you run multiple borg client against the same repo, a much longer wait time can make sense to serialise the access to the repo.

If it is only one borg client accessing that repo and you always get locking errors when accessing it, it might be also that you have a leftover lock in that repo and you might need to carefully use borg break-lock (read the docs!).

https://borgbackup.readthedocs.io/en/stable/usage/lock.html#borg-break-lock

marcohald commented 7 months ago

Thank you, I don't know how I missed the config option.

If you run multiple borg client against the same repo, a much longer wait time can make sense to serialise the access to the repo.

Each Client has its own Repository on the Backup Server, they are scheduled via cronjobs with a random wait before the borg backup. But it could be entire possible that it is to much load on the backup Server to respond fast enough.

I will close the issue as I think this will solve my Problem.

marcohald commented 7 months ago

I just found one Host where even a 10 second timeout seems not to be enough. Is there a specific Debug Topic I should specifiy to get more Logs? I normally use borgmatic as wrapper, but this command is executed directly without borgmatic as wrapper but with the same arguments.

root@client:~# borg create --exclude-from /etc/borgmatic/excludes --exclude-caches --lock-wait 10 --debug --show-rc ssh://backupuser@server.sub.example.com//srv/borg/client.sub.example.com::{hostname}-{now:%Y-%m-%d-%H%M%S} /etc /root/.borgmatic /srv/mysqldumps/nightly /srv/www --debug-topic repository
using builtin fallback logging configuration
Enabling debug topic borg.debug.repository
33 self tests completed in 0.09 seconds
SSH command line: ['ssh', 'backupuser@server.sub.example.com', 'borg', 'serve', '--debug', '--debug-topic=borg.debug.repository']
Remote: using builtin fallback logging configuration
Remote: Enabling debug topic borg.debug.repository
Remote: 33 self tests completed in 0.08 seconds
Remote: using builtin fallback logging configuration
Remote: Initialized logging system for JSON-based protocol
Remote: Resolving repository path b'//srv/borg/client.sub.example.com'
Remote: Resolved repository path to '/srv/borg/client.sub.example.com'
Remote: Traceback (most recent call last):

  File "/usr/lib/python3/dist-packages/borg/remote.py", line 240, in serve
    res = f(**args)

  File "/usr/lib/python3/dist-packages/borg/remote.py", line 368, in open
    self.repository.__enter__()  # clean exit handled by serve() method

  File "/usr/lib/python3/dist-packages/borg/repository.py", line 203, in __enter__
    self.open(self.path, bool(self.exclusive), lock_wait=self.lock_wait, lock=self.do_lock)

  File "/usr/lib/python3/dist-packages/borg/repository.py", line 435, in open
    self.lock = Lock(os.path.join(path, 'lock'), exclusive, timeout=lock_wait).acquire()

  File "/usr/lib/python3/dist-packages/borg/locking.py", line 389, in acquire
    self._wait_for_readers_finishing(remove, sleep)

  File "/usr/lib/python3/dist-packages/borg/locking.py", line 402, in _wait_for_readers_finishing
    self._lock.acquire()

  File "/usr/lib/python3/dist-packages/borg/locking.py", line 148, in acquire
    raise LockTimeout(self.path) from None

borg.locking.LockTimeout: Failed to create/acquire the lock /srv/borg/client.sub.example.com/lock.exclusive (timeout).

Failed to create/acquire the lock /srv/borg/client.sub.example.com/lock.exclusive (timeout).
Borg server: Platform: Linux server 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64
Borg server: Linux: Unknown Linux
Borg server: Borg: 1.2.7  Python: CPython 3.9.2 msgpack: 1.0.0 fuse: pyfuse3 3.2.0 [pyfuse3,llfuse]
Borg server: PID: 2564845  CWD: /home/backupuser
Borg server: sys.argv: ['/usr/bin/borg', 'serve', '--restrict-to-path', '/srv/borg/client.sub.example.com']
Borg server: SSH_ORIGINAL_COMMAND: 'borg serve --debug --debug-topic=borg.debug.repository'
Platform: Linux client.sub.example.com 5.10.0-26-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64
Linux: Unknown Linux
Borg: 1.2.7  Python: CPython 3.9.18 msgpack: 1.0.7 fuse: llfuse 1.5.0 [pyfuse3,llfuse]
PID: 428389  CWD: /root
sys.argv: ['borg', 'create', '--exclude-from', '/etc/borgmatic/excludes', '--exclude-caches', '--lock-wait', '10', '--debug', '--show-rc', 'ssh://backupuser@server.sub.example.com//srv/borg/client.sub.example.com::{hostname}-{now:%Y-%m-%d-%H%M%S}', '/etc', '/root/.borgmatic', '/srv/mysqldumps/nightly', '/srv/www', '--debug-topic', 'repository']
SSH_ORIGINAL_COMMAND: None

terminating with error status, rc 2
ThomasWaldmann commented 7 months ago

See my first reply, second half.

ThomasWaldmann commented 2 months ago

Guess this is solved?