Open nomeata opened 1 year ago
Until some recent borg release (see changelog, guess 1.2.x), Ctrl-C on the client side immediately killed both borg and the ssh client subprocess - which was problematic because the ssh connection is needed for a clean shutdown, so this was fixed to only kill borg and let the ssh subprocess live on while the borg client is shutting down.
I am not sure how exactly the server-side sshd and borg serve process behaves while shutting down.
Locks: In general, borg has some code to remove the lock in clean shutdown cases as well as for errors/exceptions - but of course it can not clean up in cases like SIGKILL or server crashes, power failures, etc.
There is also code to remove own stale locks (== if borg can detect that the process with the PID that created the lock is definitely not alive any more).
Your problematic repo, is it accessed by multiple borg clients / is stuff running in the background you maybe have overlooked? borg break-lock
must not be used while any borg is still using the repo.
It's not used by any other machine or process, I think it can only be the first borg compact
that was still running.
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
I am unsure, feel free to close if nothing actionable comes out of it.
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
Local: borg 1.2.2 Remote: borg1 1.2.3 (on rsync.net)
Operating system (distribution) and version.
Local: NixOS 22.11 Remote: ?
Hardware / network configuration, and filesystems used.
Local: Thinkpad, ext4 Remote: ZFS I believe
How much data is handled by borg?
A ~250G repo
Full borg commandline that lead to the problem (leave away excludes and passwords)
I believe what happened was:
borg prune
succcessfullyborg compact
run.--progress
flag.borg compact --progress
. It complained about a lock file.borg break-lock
borg compact
. Got various error messages. Annoyingly, I already closed the terminal, sorry…borg check --last 1
. Got messages about missing segments and other issues.A subsequent repair job finished, removed some segments with zeros, and afterwards I could compact, but I have lost some faith in this particular repository, and am now creating a new one.
I assume what happened was that the first
borg compact
was actually still running on the server, and then I started the second one and that broke things.I understand it’s mostly my fault, but maybe there are still some lessons to be learned from this that will prevent others from making that mistake.
Some ideas that would have helped me not break my setup:
borg compress
could run with progress by default, at least if the output is a terminal.I would have expected Ctrl-C to kill the server side
borg
reliably, but maybe that was optimistic. I have seen fixed issues related to that; so maybe it is aborg compact
specific issue? Or I was just unlucky?Maybe this could be make more robust?
For example, the local
borg
could wait for some kind of explicit confirmation from the other side that it has stopped, and warn loudly if that did not come?The lock file could (if it doesn’t already) contain the pid of the process, and the warning about not being able to aquire the lock file could indicate if a process of that pid is running or not (like vim does). Of course with networked file systems, absence of a running process is not a guarantee, but presence is certainly a sign that one should not
break-lock
too easily. Especially useful onrsync.net
where I cannot log in an runps
manually.If none of these ideas are useful feel free to close this issue, and nevertheless thanks for maintaining borg!