borgbackup / borgstore

experimental storage backend
Other
7 stars 3 forks source link

rclone backend and exception handling #54

Closed ThomasWaldmann closed 1 month ago

ThomasWaldmann commented 1 month ago

I noticed that borgbackup hung when it usually would terminate.

E.g. if it fails to acquire a lock on the repository (because there is already an exclusive lock by some other borg):

LOCK-ACQUIRE: timeout while trying to acquire a lock.
Failed to create/acquire the lock <Store(url='rclone://pcloud:borg2-test2-rclone', levels=[('config/', [0]), ('data/', [2])])> (timeout).

Error:

LockTimeout: Failed to create/acquire the lock <Store(url='rclone://pcloud:borg2-test2-rclone', levels=[('config/', [0]), ('data/', [2])])> (timeout).

If reporting bugs, please include the following:

Traceback (most recent call last):
  File "/Users/tw/w/borg/src/borg/archiver/__init__.py", line 623, in main
    exit_code = archiver.run(args)
  File "/Users/tw/w/borg/src/borg/archiver/__init__.py", line 517, in run
    rc = func(args)
  File "/Users/tw/w/borg/src/borg/archiver/_common.py", line 151, in wrapper
    with repository:
  File "/Users/tw/w/borg/src/borg/repository.py", line 134, in __enter__
    self.open(exclusive=bool(self.exclusive), lock_wait=self.lock_wait, lock=self.do_lock)
  File "/Users/tw/w/borg/src/borg/repository.py", line 186, in open
    self.lock = Lock(self.store, exclusive, timeout=lock_wait).acquire()
  File "/Users/tw/w/borg/src/borg/storelocking.py", line 206, in acquire
    raise LockTimeout(str(self.store))
borg.storelocking.LockTimeout: Failed to create/acquire the lock <Store(url='rclone://pcloud:borg2-test2-rclone', levels=[('config/', [0]), ('data/', [2])])> (timeout).

Platform: Darwin iMac2020.local 24.0.0 Darwin Kernel Version 24.0.0: Mon Aug 12 20:54:30 PDT 2024; root:xnu-11215.1.10~2/RELEASE_X86_64 x86_64
Borg: 2.0.0b11.dev54+gc2436fb2  Python: CPython 3.9.20 msgpack: 1.0.7 fuse: llfuse 1.5.1 [pyfuse3,llfuse]
PID: 16197  CWD: /Users/tw/w/borg
sys.argv: ['/Users/tw/w/borg-env/bin/borg', 'create', 'src', 'src', '--list', '--debug']
SSH_ORIGINAL_COMMAND: None

Usually borg would terminate here due to the exception, but it doesn't - it just sits there until the users presses Ctrl-C:

^CException ignored in: <module 'threading' from '/usr/local/Cellar/python@3.9/3.9.20/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.20/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 1477, in _shutdown
    lock.acquire()
KeyboardInterrupt: 

And then it terminates.

@ncw is that due to the rclone background process?

ncw commented 1 month ago

Looks like it is the background thread which empties the stderr pipe.

I think setting daemon=True on the thread creation should fix it.

I can send a PR for this tomorrow if you want?

ThomasWaldmann commented 1 month ago

@ncw that would be great, thanks!

ncw commented 1 month ago

I've created a pr in https://github.com/borgbackup/borgstore/pull/55

I think this should work but I haven't tested it!

I note that this probably means that borgstore is exiting with the backend repository open. This might leak the rclone rcd process in that case (not sure!).

ThomasWaldmann commented 1 month ago

Oh, interesting, I'll check whether borg leaves the repo open. There is a context manager for the repository, but maybe something isn't quite working as expected yet.