elemental-lf / benji

Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
https://benji-backup.me
Other
136 stars 43 forks source link

Coordination with Velero #149

Open alexander-bauer opened 1 year ago

alexander-bauer commented 1 year ago

Hi @elemental-lf, to cut to the chase, I'm looking for help while I write some tooling. I'm hoping to put together something robust and useful enough that other folks can take advantage as well, and I want to make sure I'm on the right track before I invest too much time.

Background and Motivation

I have an existing (homelab) Kubernetes cluster that is ultimately backed by Ceph, managed by Rook. The storage is attached to the same execution nodes. Right now, I use Velero to schedule and manage my backups - this is a tool largely aimed at making Kubernetes resources restorable to the same or other clusters, and is not especially interested in backups, per se -- it can be made to create VolumeSnapshots (and VolumeSnapshotContents) using CSI, but doesn't muck with exporting those snapshots.

I think that approach makes a lot of sense for managed Kubernetes instances, or instances backed by a robust storage medium with its own replication capabilities. Indeed, it would be great if I had a couple dozen hosts to handle replication at the Ceph level.

As far as off-medium (and off-site) backups, it's a lot more cost effective to run a standalone Minio pod backed by an external drive, and call that "archival object storage." (Especially with something like rclone to Backblaze B2 for off-site replication.)

Benji strikes me as well-organized and well-regarded, and as operating at the perfect layer to fill in the gap for me:

Where the Existing Tooling Clashes

As far as I'm able to tell, a typical scheduled Velero backup grabs copies of most Kubernetes resources (as returned by the API), and serializes them to S3. PVCs are special: Velero optionally injects a command to the attached pod (such as fsfreeze), asks CSI to create a VolumeSnapshot, waits for it to complete (and for the VolumeSnapshotContent to be available), then injects another pod command, and then serializes the VolumeSnapshot and VolumeSnapshotContent objects to archive.

Of course, the VolumeSnapshotContent contains a reference to the saved data on the underlying storage layer (in this case, RBD), but that data is not replicated to archive.

Benji is very well positioned to take that backup model and shore up the final piece: replicating the underlying RBD snapshot to archival storage.

Where the clash with existing tooling is, is that the existing scripts seem tuned for Benji as the primary backup provider, with it responsible for freezing the filesystems, taking the snapshots, and for managing their lifecycles.

Proposition (i.e. please help me do this)

I think that there's no fundamental disagreement here, just a need for a for-purpose script. Ideally, it'd be one general and robust enough to be included in Benji's standard distribution, with associate documentation to help out anyone who may be following down the same path that I am.

So: I want to write a script along the lines of the existing backup_pvc.py, which crawls through existing VolumeSnapshot objects, and ensures that any corresponding VolumeSnapshotContent (or, at least the ones on RBD) are replicated to archival storage.

I think this should be as simple as:

Where I expect to find issues:

Thanks for reading! Sasha

elemental-lf commented 1 year ago

Thanks for the write-up , Sascha. I actually thought that Velero is also taking care of the actual volume content but live and learn.

vriabyk commented 1 year ago

Hi people, we are interested in this feature too.

Currently we are facing issue when using benji for backing up k8s pvcs (ceph csi). Benji creates rbd snapshot in ceph and then uses it to create further incremental backups. But when someone deletes k8s pvc, Ceph CSI successfully deletes pvc/pv objects from k8s while the rbd image gets stuck in rbd trash because ceph can't delete image which has snapshots. When number of such images in trash grows to 1025 (not sure if it is some configurable limit or not), ceph csi stops provisioning new pvcs at all.

I created feature request for ceph csi developers, asking to implement at least an option to remove rbd snapshots before deleting k8s objects. But they don't want to do so.

Therefore we are looking for a way to workaround it. If benji will work with VolumeSnapshotContent instead of direct access to Ceph, then Ceph CSI should clean up everything properly and we won't face stuck images in trash.

@elemental-lf pls let me know if it is clear and if you need any more details or assistance.

elemental-lf commented 1 year ago

@vriabyk my plan is to pull the whole workflow of making a snapshot, getting the differences and then doing the actual backup into Argo Workflow based workflows. That way it would be easier to extend the workflow for using VolumeSnapshotContent. The base for this just landed in master. Currently it's an almost 1:1 translation of the old CronJob based setup but you could start from there.