Feature: centralized backup management

edersong commented 5 months ago

For me, thr big problem with Restic is to manage, I mean, for each server it's installed, will need a different management location. It's difficult to manage a big set of servers, so I would like to know if is there a plan for Backrest to manage all the Restic backups from a single WebUI. It could be like Cockpit do with Linux Server, but it will be better if there is a dashbord where we can follow the latest backups results from all servers.

garethgeorge commented 5 months ago

Really interested to see this come in as a feature request, this is something that I'm thinking about in the background (and something that backrest intends to be able to support architecturally).

Can you elaborate a bit on your use case? Do you care primarily about being able to view backup status and results in a centralized place? Or do you want to be able to manage backup configurations across a fleet of machines / perform bulk operations?

Vatson112 commented 4 months ago

Hi @garethgeorge!

I am also interested in this feature.

I propose the same design as Bareos.

We will have:

Central Server = backrest
Agents on hosts we want to backup.

Our server send request to agent (authenticated by mTLS) with restic config , some pre/post-scripts. Then our client backup directrly to repository backend or we can use rest-server and backup to central server using rest protocol.

Also may be we can use agentless setup = send restic command using ssh connection. But there are caveats aka long lived ssh may be interupted by SSHD configuration.

garethgeorge commented 4 months ago

Interesting, when I'd considered this feature in the past I'd imagined something along the lines of

A central server that manages configuration and replicates logs from a collection of workers.
Each worker receives config updates from the central server when an admin user is changing settings.
Workers accept configuration updates and schedule operations locally
Each worker pushes copies of each operation log update to the central server (in addition to maintaining a local operation log such that backup history can be viewed locally on the worker only for that worker).

I'll read through the Bareos docs, do you see there as being strong advantages one way or the other w.r.t. the central server being responsible for pushing commands to each of the workers? I have some concern that the central server becomes very highly privileged if it's SSHing in and running backup operations. From an implementation perspective though, it might be very simple to just open up to running restic commands over SSH (as you mention) and to support scheduling operations in parallel such that backups can be run over multiple at the same time (constraint would likely be 1 backup per repository at a time).

edersong commented 4 months ago

A central server that manages configuration and replicates logs from a collection of workers. Each worker receives config updates from the central server when an admin user is changing settings. Workers accept configuration updates and schedule operations locally Each worker pushes copies of each operation log update to the central server (in addition to maintaining a local operation log such that backup history can be viewed locally on the worker only for that worker).

That's I desire. Currently, I'm using UrBackup which has a centralized administration, but I think that Restic is more modern in terms of backup technologies, but don't have a centralized management yet which difficults the backup management from a farm of servers.

Nebulosa-Cat commented 3 months ago

for my example, i have 1 raspberry pi, 4 vps (debian and ubuntu, x86 and armv8) need use restic backup i use rclone create a dropbox remote and create another encrypt remote in the dropbox one and for my many mechine, both of them have there own restic repo so the resitc is rclone:the-encrypt-one-name:hostname for example rclone:abc-encrypt:raspberry-pi-backup rclone:abc-encrypt:vps-1 rclone:abc-encrypt:vps-2 ...

For my usage scenario, I hope that the Backrest I run on the Raspberry Pi is the main control device, and the ones running on other devices/VPS are clients, and all control is done through Backrest on the pi.

And in terms of UI, my backup mode is to back up once a day (retain up to 30 items), and back up once a week (never delete), so the structure I expect is like this:

Plan:
raspberry-pi
- Daily-Plan
- Week-Plan
VPS-1
- XXX
- YYY
VPS-2
- XXX
- YYY

for this kind many mechine management, i think it will need some custom folder tree that user can order there plan

oliverjhyde commented 3 months ago

This would be a killer feature, at the moment I'm using Synology Active Backup for Business to backup and deduplicate across 12 Windows machines but there are a couple of issues:

Laptops with low disk space frequently error out
Remote machines (connection via Tailscale) sometimes drop out and the resume isn't handled particularly well resulting in a paused backup state for a couple of days before it sorts itself out
I have a couple of openSuse machines that ABB doesn't support (currently manually backing these up with Restic and now Backrest)

Being able to use Backrest centrally manage the backup configuration (scheduling, directories) with a status page to know if something is failing/missed its schedule for x interval would be fantastic.

Even better if the local web interface could be used to restore an older version of the file (either to a new location as current or over the original as #118 suggests making possible). Ideally if centrally managed the configuration couldn't be changed here - though this could be locked down by user account?

garethgeorge commented 2 months ago

Hey all, updating this thread as it's a pretty requested feature and it's also a capability I want for my own systems -- likely looking at prototyping this in the near term

I'm investigating a few avenues for implementation

As a cloud service that's user deployable (e.g. think terraform configs provided). There's some value here in that a savvy user can deploy this in a serverless model and only pay for what they use (doesn't need to be running all the time.
As an always running service e.g. with a monitor and daemon model. In this model my main concern is making it easy for daemon's (possibly behind complicated firewalls) to establish connection to and also receive commands from the monitor. I'm vaguely interested to investigate whether something like http://libp2p.io is a good fit to solve some of the networking problems here re: firewall hole punching.

To provide some design details -- this will likely look something like:

backrest binary (now referred to as the daemon) continues to ship as it does today and will always host a local copy of it's UI
backrest binary will add a new --monitor-uri flag where a connection string to a monitor process can be provided. Think of this like the [docker swarm join](https://docs.docker.com/reference/cli/docker/swarm/join/) command which enroles a node in a monitor, the token will also contain credentials e.g. a shared secret w/the monitor process.
monitor process will receive operation log updates from various backrest daemons and will expose a new UI (shared code base) with the operation tree grouped by host name. Logically, hosts will use cryptographic unique identifiers under the hood.

I'm slightly leaning towards the daemon / monitor process model because it's more in line with the self-hosted ethos. There are also some interesting possibilities to examine in the future here e.g. centralizing some operations e.g. run backups on daemon processes (with read only credentials to repos) but run prune operations only on the trusted monitor process. I'm still thinking through what this might look like / how it'd be configured. Perhaps a concept of a meta-plan is needed to logically group plans across multiple nodes.

edersong commented 2 months ago

Hello, @garethgeorge Thank you for the feedback! For the options you provided, I think that the 2. will be better because, in my case at least, I use all my services locally and will not be using a cloud service just for monitoring and less paying for that service. Count with me as beta tester and to give feedbacks. ;-)

brandonkal commented 2 months ago

P2P is not necessary for this use-case. I suggest:

Users deploy backrest binary to all nodes that should backup
The only requirement is that one backrest deployment is accessible by all nodes. Therefore, the main backrest should be accessible via the internet or the user can choose their own network infrastructure (Tailscale, wire-guard, netbird, etc)
Each node is configured to register itself with the main backrest node. It then polls the main node API eg GET /api/backrest-config?node=node-uuid at a set interval. 10s or even longer is fine as this is just config.
In the main backrest web UI, you can assign plans to nodes. When that node requests its configuration, it gets only the plans it is assigned and the repositories those plans depend on
Nodes push progress updates to the central server POST /api/backrest-logs?node=node-uuid

asitemade4u commented 2 months ago

+1

Emiliaaah commented 1 month ago

I definitely like all the ideas presented so far, but I just thought I'd think aloud about some of my thoughts.

For syncing the nodes config just using an API endpoint with a set interval is probably just fine, but I'd personally also like to be able to manually run actions on those agents. Now using this same model for that, but then for some sort of action queue would probably work fine. It got me thinking however wouldn't something like web sockets be more ideal for such a use case?

Using web sockets would have the benefit of not having to constantly pull 1 or multiple endpoints every x seconds, especially if we'd need to have separate endpoint for the config, manual actions, etc.. It would also prevent having a delay between triggering an action and it actually being performed on the node (assuming it can perform that action at that time).

garethgeorge commented 1 month ago

Hey all, thanks for all the interest in this issue -- just updating to say that steady progress is being made toward supporting centralized backup management. Much of the refactoring (and migrations) in the 1.0.0 release are focused on readying the Backrest data model to support operations (possibly created by other installations) in repos and correct tracking of those operations.

On the networking front: I'm still investigating here, Backrest uses gRPC under-the-hood which is natively http/2. Because connectivity / syncing operations will happen on the backend we're not restricted to web technologies e.g. Websockets. I'm agreed that polling is a not the way we want to go, TCP keep-alive is much cheaper than repeatedly re-establishing connections (especially if they are HTTPS -- and they should be!).

I'm hoping to find a good OSS option that I can shim gRPC requests onto (such that they can be initiated by the hub and sent to the clients -- which really looks like some sort of inversion layer where clients will actually be establishing and "keeping alive" TCP channels to the backrest hub). I think https://libp2p.io/ may have some capabilities here (though I do not want to pull in any of the mesh networking / connectivity to the ipfs swarm from that project) but I'm wanting to find simpler alternatives -- which could ultimately look like building it myself!

Another problem space I'm still giving thought to is the relationship between the hub and clients. In particular, the model I'm imagining will be common is many client devices backing up to a single repo. In this case, I feel that the hub should be able to centrally coordinate maintenance operations e.g. "forget" and "prune" execution. I'm considering here whether:

The hub can act as a lock / scheduling coordinator, allowing clients to take out an advisory lock on a repo that will block other clients from attempting their own forgets / prunes.
The hub can act as a centralized place to run maintenance e.g. the hub runs forgets / prunes on repos locally.

The latter approach has the disadvantage that the hub will need access to a repo config for each repo used by a client BUT it also has the significant advantage that clients may be read only (e.g. you can centralize trust in the hub with many low-trust clients, this protects against ransomware. In my case I'd likely run a low-cost IPv6 VPS dedicated to this purpose). Not yet sure what will be best here but I am leaning towards the latter option.

jcunix commented 1 month ago

Very happy you are going down this path!

brandonkal commented 1 month ago

I don't want to have to depend on persistent TCP connections or websockets for this functionality. While it is less work than if you polled frequently, it limits the usefulness of backrest for bandwidth-limited clients, IOT, edge, etc. A lot of the things that would be useful to centrally manage with this project really only need to check their config before they run a backup task, once a day at most. For SIM connections that bill by connection time and bandwidth, the minimal benefit (instant config update) would not outweigh the high cost incurred.

swartjie commented 1 month ago

@brandonkal I disagree, If data usage is an issue, you're not going to be running backups over that connection anyway. Polling is fine I think since it does update, the issue with polling though can be the delay in actions. The TCP & Websockets route is nice since the actions are not as delayed, which improves the UI experience

jcunix commented 3 days ago

Hi, just thought I'd check in. Any updates on progress? Very interested in this functionality.

garethgeorge / backrest

Feature: centralized backup management #68