borgbackup / borg

Deduplicating archiver with compression and authenticated encryption.
https://www.borgbackup.org/
Other
10.96k stars 740 forks source link

rclone Integration #5324

Open buengese opened 4 years ago

buengese commented 4 years ago

Hi, I'm one of developers of rclone a command line tool written in golang to interact with various cloud storage providers. I'm wondering if the team here is interested in some sort of integration with rclone. We have a number of users that users that use borg with a cloud storage provider mounted via rclone with varying levels of success. Even though rclone's vfs has gotten much mature over time there is still significant overhead associated with it so a more direct would be beneficial. Currently rclone already has integration with restic that is achieved by a custom http2 api we are running on stdio. If only had a quick look at borg's remote repository protocol and not yet sure if it would be viable to implement something based on this on rclone's side.

Given the rather obvious absence of any cloud storage integration on borg's side, which may have been a conscious decision, I'd like to know if this something you would be in principle interested in before looking into this any further. buengese

ThomasWaldmann commented 4 years ago

I guess it would be interesting if it could be done without much changes to the existing Repository and RemoteRepository classes, e.g. by adding yet another one and triggering its use via repo url.

I can help if there are questions about borg internals (that are not already answered in the internals docs), but I don't use cloud storage myself.

buengese commented 4 years ago

Thanks for the info. I'm going to look into how far this possible with minimal changes to the current RemoteRepository setup.

dragetd commented 3 years ago

Some thoughts I want to drop here, without being deep in the code base on either project. Please excuse if I misunderstanding concepts - if this is the case, I will delete my post. :) Background: Long time borg an rclone user, currently using it via rclone fuse/mount, including all the pain. :P

borg has a serve mode where it basically takes commands via SSH and acts as the 'storage engine'. Having code at the receiving end allows to do some operations on the server-side without roundtrips to the client. This of course is not possible with a typical cloud storage.

So one would not be able to use this architecture and would need to hook only into the storage layers. Which prevents some optimizations, but that is a logical consequence.

As far as I know, there is no big abstraction in the storage/file-io code in general in borg. Given the huge amount of possible options rclone has, I think a tight integration might be difficult here. But at the same time, rclone already has an architecture for providing backends through other protocols (e.g. SFTP) vai its own serve. One idea I had was, that we decouple things by rclone running as its own process and providing a prococol (backend) via rclone serve that we integrate into RemoteRepository on the Borg side or add a different kind of RemoteRepository. Borg on the client side takes care of deduping, hashing etc. and the borg serve side 'only' needs to support some repository operations.

A typical borg remote repository is configured as ssh://user@host:port/path/to/repo (or without the explicit protocol, but I'd keep it here). And instead, the user could specify either ssh+rclone://user@host:port/remote/path/to/repo or rclone:///remote/path/to/repo

If ssh+rclone is specified, borg connects to the remote and starts a borg serve as usual at first. If only rclone is specified, borg would also start a borg serve, but would need to skip SSH and maybe use TCP locally to talk to borg serve. Alternatively we would only support the ssh+rclone implementation.

borg would look for rclone in the path and then strip the first component of the path, using this as the remote which has be configured in rclone. Then rclone is launched with a new borg-specific backend which is passed to the remote to use and the path of the repository. The new borg-backend also checks if this remotes supports everything we need (without seeking/appending, it would be a PITA)

This rclone backend would have to reimplement the repository configuration and locking mechanisms, the later probably being a major PITA along this road. What is left is mapping the repository operations to however rclone backends work.

That is basically the idea to work around some of the architecture challenges we have right now in borg. The idea of having a server-side implementation that is more flexible and allows different protocols has been floating around for a while, but was never picked up on. The loose coupling as I described it, might reduce the amount of refactoring needed in borg. So at least I personally would be okay with hacking something specific for rclone into a new 'RemoteRepository' implementation in borg and not try to design some general, flexible backend architecture here.

PS: As I said, I am not deep in either codebase. But maybe someone would be interested in bouncing around ideas via an online video session. I can offer a BigBlueButton instance (browser-based, no login, privacy, opensource) and would be available tomorrow (Monday, 5th october) around ~21:00 CEST. Drop a note!

[EDIT] Okay, after talking with someone on IRC and reading some things, the serve 'protocol' is actually simpler than I thought. So maybe a direct 'serve' integration would work.

buengese commented 3 years ago

@dragetd Great to see there is some else interested in this. Sorry for coming back to you this late. I had already looked at Borg's RemoteRepository and it's definitely feasible to implement this into rclone. I think the current approach used by remote repository is relatively viable. As far as i can tell it's done by passing MsgPack data via stdio and through the ssh tunnel to another instance of borg. Most of the repository logic is on client side. Rclone already does something similar for restic (another backup tool) the communication also works through stdio but using http2 with rclone running separately. Admittedly I haven't really worked on this any further despite announcing my intention to do so over a month ago (other projects got in the way). It'd still like to get to this soon hopefully this weekend especially if there is also interest from this now.

enkore commented 3 years ago

The serve protocol is pretty simple, as you noted, and also fairly stable (because of backwards-compatibility with old borg versions).

One potential drawback of adding repository drivers like this that came up way-back-then was that you probably end up with totally different or slightly different repository formats, so migrating a repository that uses driver A to a repo using driver B would need support from borg (a low-level, object-for-object copy, which is fairly simple, unlike replication, but it would still have to be added).

IIRC back then the idea was to just use "xyz://..." which would invoke something like "borg-driver-xyz" (=> loosely coupled binaries) with the rest of the URI as the sole parameter and use that for RPC.

There were also some daydreams about this then being able to be routed through qubes exec and such with basically no effort, which would have made backups of Qubes machines wayyyyyyy easier (their built-in backup was crap at the time).

luminoso commented 3 years ago

Is there any progress on this?

ThomasWaldmann commented 3 years ago

not AFAICS.

enkore commented 3 years ago

What needs to be done from my PoV to make this happen:

1.) is the hard part, 2.) is a bunch of legwork, 3.) is easy.

git70 commented 2 years ago

Hi

Two questions:

  1. Any updates on this topic?
  2. Do I understand correctly that the end result of this case may be similar to Restic's native Rclone support? https://restic.readthedocs.io/en/latest/030_preparing_a_new_repo.html#other-services-via-rclone

I love Borg and I'm not going to betray him ;) I just want to complete Borg with some cool features for me :)

ThomasWaldmann commented 2 years ago

@git70

  1. not as far as i know
  2. maybe. but not sure whether that can be done with what we have currently.

The easiest workaround is to just use a remote with borg support: borgbase, hetzner storagebox, rsync.net, own machine, ...

git70 commented 2 years ago

A little worse, but I understand that there may be serious reasons and too much work :( The point is, I have a lifetime account on pCloud and I wouldn't have to pay for other services separately. Regards!

ThomasWaldmann commented 1 month ago

After #8332 is finished / merged, integrating rclone would likely become much easier.

Maybe it could be done in a similar way as with restic:

In borg 2.0 (current master branch), things are still a bit complicated as it needs to support old borg repos also, at least as a source for borg transfer.

borg 2.1 is planned as the version when we get rid of all the borg < 2 legacy and remove a lot of code that won't be needed any more (old crypto, old repo code, ...).

ThomasWaldmann commented 1 day ago

@buengese Did you already have a look?

8332 is merged and borgstore repo has a PR for a REST client, but misses a REST server yet.