borg compact, Ctrl-C, Lock file visibility

nomeata commented 1 year ago

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

I am unsure, feel free to close if nothing actionable comes out of it.

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Local: borg 1.2.2 Remote: borg1 1.2.3 (on rsync.net)

Operating system (distribution) and version.

Local: NixOS 22.11 Remote: ?

Hardware / network configuration, and filesystems used.

Local: Thinkpad, ext4 Remote: ZFS I believe

How much data is handled by borg?

A ~250G repo

Full borg commandline that lead to the problem (leave away excludes and passwords)

I believe what happened was:

I got a quota warning from rsync.net
I ran borg prune succcessfully
I started a borg compact run.
It didn’t print something for a while. I noiced in the logs that there is a --progress flag.
Ctrl-C. The command quit without further output (I think)
Ran borg compact --progress. It complained about a lock file.
Waited for maybe 5 mins or so, in case there is a remote process winding down.
Lost patience, and used borg break-lock
Ran borg compact. Got various error messages. Annoyingly, I already closed the terminal, sorry…
Ran borg check --last 1. Got messages about missing segments and other issues.

A subsequent repair job finished, removed some segments with zeros, and afterwards I could compact, but I have lost some faith in this particular repository, and am now creating a new one.

I assume what happened was that the first borg compact was actually still running on the server, and then I started the second one and that broke things.

I understand it’s mostly my fault, but maybe there are still some lessons to be learned from this that will prevent others from making that mistake.

Some ideas that would have helped me not break my setup:

borg compress could run with progress by default, at least if the output is a terminal.
I would have expected Ctrl-C to kill the server side borg reliably, but maybe that was optimistic. I have seen fixed issues related to that; so maybe it is a borg compact specific issue? Or I was just unlucky?

Maybe this could be make more robust?

For example, the local borg could wait for some kind of explicit confirmation from the other side that it has stopped, and warn loudly if that did not come?
The lock file could (if it doesn’t already) contain the pid of the process, and the warning about not being able to aquire the lock file could indicate if a process of that pid is running or not (like vim does). Of course with networked file systems, absence of a running process is not a guarantee, but presence is certainly a sign that one should not break-lock too easily. Especially useful on rsync.net where I cannot log in an run ps manually.

If none of these ideas are useful feel free to close this issue, and nevertheless thanks for maintaining borg!

ThomasWaldmann commented 1 year ago

Until some recent borg release (see changelog, guess 1.2.x), Ctrl-C on the client side immediately killed both borg and the ssh client subprocess - which was problematic because the ssh connection is needed for a clean shutdown, so this was fixed to only kill borg and let the ssh subprocess live on while the borg client is shutting down.

I am not sure how exactly the server-side sshd and borg serve process behaves while shutting down.

Locks: In general, borg has some code to remove the lock in clean shutdown cases as well as for errors/exceptions - but of course it can not clean up in cases like SIGKILL or server crashes, power failures, etc.

There is also code to remove own stale locks (== if borg can detect that the process with the PID that created the lock is definitely not alive any more).

Your problematic repo, is it accessed by multiple borg clients / is stuff running in the background you maybe have overlooked? borg break-lock must not be used while any borg is still using the repo.

nomeata commented 1 year ago

It's not used by any other machine or process, I think it can only be the first borg compact that was still running.

borgbackup / borg