borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
11.2k stars 743 forks source link

Add option to pass a pair of file descriptors as alternative to ``--rsh`` #4749

Open horazont opened 5 years ago

horazont commented 5 years ago

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Feature request, I guess?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.1.10

Operating system (distribution) and version.

Debian GNU/Linux bullseye/sid

Hardware / network configuration, and filesystems used.

N/A

How much data is handled by borg?

N/A

Describe the problem you're observing.

I am writing a few things to document and ease pull-style operation (#900). During this, I came across the issue that I somehow need to pass a socket or a pair of file descriptors to borg instead of a rsh command.

Of course, I can emulate that using a shell script like this as RSH:

#!/bin/bash
exec socat "UNIX-CONNECT:$BORGWRAP_SOCKET_PATH" STDIO

However, it would be much easier if we could simply pass file descriptor numbers to --rsh on platforms which support that. Alternatively, pass a path to a unix socket to connect to.

ThomasWaldmann commented 5 years ago

Are there other tools which take FDs as an option and how is their cli option named?

Guess it wouldn't be too hard to implement that, but it also needs to be tested, so how can one test this?

horazont commented 5 years ago

Are there other tools which take FDs as an option and how is their cli option named?

I could’ve sworn I saw that, but I haven’t been able to find any example. Tools seem to prefer environment variables for that for some reason (which would be fine by me).

I think it might make more sense to allow passing a UNIX socket instead of a pair of file descriptors, maybe?

Guess it wouldn't be too hard to implement that, but it also needs to be tested, so how can one test this?

I have looked a slight bit into the code, and I saw that for testing, you fall back to invoking python with -m borg.archiver instead of invoking borg serve remotely.

In a test, you could let the FD option take precedence over that and spawn a borg serve e.g. using socat EXEC on the side. Although the downside is that the process lifecycle of the borg serve needs to be managed by the test tool. (And, of course, it is not possible to pass options to the borg serve process using this mechanism, which needs to be documented properly.)

felinira commented 5 years ago

Connecting to a UNIX socket file or connecting to a localhost TCP port would be nicer than file descriptors for most use cases I can think of. Socket files are a bit annoying in that they need to be unlinked before usage etc and ssh is not reliably unlinking socket files (at least for me). But there could be the option of doing both.

Syntax could look something like this:

borg create borg:///run/borg/borg-serve.socket:/path/to/repo::/archive_name
borg create borg://localhost:1030:/path/to/repo::/archive_name

The reasoning behing calling the protocol borg would be because the socket would be connected to a borg serve process.

To complement this borg serve could have the option --listen (or something similar)

borg serve --listen /run/borg/borg-serve.socket
borg serve --listen localhost:1030

This behavior would make pull-style backups with borg serve running via ssh reverse tunnel as easy as this. Example to backup up host potato:

borg serve --listen /run/borg/potato.socket &
ssh -o "StreamLocalBindUnlink yes" \
    -R /run/borg/borg-serve.socket:/run/borg/potato.socket potato \
    borg create borg:///run/borg/borg-serve.socket:/backups/potato::new folder1 folder2

or with TCP sockets:

borg serve --listen 127.0.0.1:5060 &
ssh -R 1030:127.0.0.1:5060 potato \
    borg create borg://127.0.0.1:1030:/backups/potato::new folder1 folder2

One could even create an alias/wrapper to all of this. I don't know whether this would still be in the scope of borg:

borg pull potato /backups/potato::new folder1 folder2

could start borg serve on a temporary socket/port, start ssh remote port forwarding on a temporary port and call borg create on the remote host with the appropriate options, wait for the backup to complete and stop borg serve again. This might look like out of scope but borg can already backup to remote ssh repositories and call borg serve there. This means it already has the capabilities to call ssh. This would in theory just be the "reverse way" of doing borg create. IMHO this would simplify things a lot.

ThomasWaldmann commented 5 years ago

@felinira that sounds pretty interesting, but what I don't like about it is the complex syntax for the REPO_ARCHIVE argument. Users would need to cope with that and our code also.

I mean especially the socket file variant (the tcp variant is I guess easier to get right):

borg:///run/borg/borg-serve.socket:/path/to/repo::/archive_name

IMHO it was a mistake to ever have repo and archive name mixed together in one argument (see the complex / error prone parsing code we have for that).

Your syntax suggestion would make this even way more complex than it already is and there would be quite some probability of introducing new parsing issues (and new usability issues on the user side).

felinira commented 5 years ago

Yeah, I don't like it either. Of course you could add something like --connect but that would make it inconsistent with ssh://. One could of course add a third alternate syntax to ssh to break it all up.

borg create --connect ssh://backupsrv /path/to/repo/on/backupsrv::archivename
borg create --connect borg://localhost:5060 /path/to/repo/on/backupsrv::archivename

I don't really know if this is better.

horazont commented 5 years ago

Maybe this is something to postpone for an eventual 2.0 version where #948 is also addressed? I can think of two variants of how to support this in that case.

Variants

Variant 1

In that case, there could be a --repository argument which supports URIs:

What obviously is missing is that there’s no way to select the path to the repository for the borg:// protocol. I think in the pull-style scenarios, you’ll want to predetermine the repository on the puller side of things anyways (using --restrict-to-repository). If the borg serve process could be asked which repository it exports, the borg client (running on the pullee) could simply ask the borg server for the path instead of having it set in the URL.

If one wants to support multiple repositories, I think the only way which doesn’t lead to madness (and which is supported with urllib etc.) would be to pass the path to the repository via a query argument (i.e. borg:///path/to/socket?repo=/path/to/repo). That makes it inconsistent with ssh:// and file://, but otherwise parsing will be a PITA.

Variant 2

Have two arguments, --connect and --repository. This is very similar what @felinira just proposed.

--connect accepts a URI:

--repository then determines the path to the repository. Note that in this case, ssh:// can not be used with a path (for consistency). In that case, the ssh:// path could be used to point to the borg executable (like: ssh://user@server:22/usr/bin/borg).

An archive is then addressed via three parameters:

I find Variant 2 much more consistent and prefer it.

felinira commented 5 years ago

I prefer Version 2 too. One could add an option like --pull-host <ssh host> which would then replace --connect, run borg serve --restrict-to-repository on a local socket, do ssh remote port forwarding and specify the correct --connect options for borg create on the remote end to connect to the socket. Or have a separate command, not sure. This would make it both very simple to implement a pull style backup (install borg on both machines, run one command, don't worry about lifecycle management of borg serve processes) and flexible (you can specify ports and socket files if you want and implement the connection between these sockets yourself).

The only issue with all these things is security. There needs to be an explicit security warning that you shall not expose borg serve directly to the network. It should be obvious and it's already possible if you really want to but this makes it way easier. Maybe the code should explicitly disallow listening on non-local sockets as the connection isn't even authenticated let alone encrypted.

horazont commented 5 years ago

Your observations sell me even more on Variant 2.

Of course, this means that this needs to be postponed until a breaking release. But since a viable workaround (shellscript + --rsh) exists, I don’t think that’s too bad.

felinira commented 5 years ago

You can even work around it without shell script and --rsh="bash -c \"exec socat [...]\"".

Are there any plans for such a breaking release? Glancing over the issue list I can see quite a few issues tagged breaking. But I understand if this is not really a good enough reason... breaking all wrapper scripts and cron jobs for everyone definitely isn't something to do lightheartedly.

horazont commented 5 years ago

You can even work around it without shell script and --rsh="bash -c \"exec socat [...]\"".

! I did not realize that until now. Thanks. That’ll make things much nicer. Doesn’t even need bash, sh will do.

ThomasWaldmann commented 1 year ago

See #7615 (implement sockets) and #7618 (overlaps with this one).