backube / volsync

Asynchronous data replication for Kubernetes volumes
https://volsync.readthedocs.io
GNU Affero General Public License v3.0
566 stars 65 forks source link

Option to remove data on replication destination prior to restoring data #555

Open onedr0p opened 1 year ago

onedr0p commented 1 year ago

Describe the feature you'd like to have.

Add an option to the data movers that removes all data on the replicationdestination prior to restoring data from a backup.

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: datavol-dest
spec:
  trigger:
    manual: restore-once
  restic:
    repository: restic-repo
    destinationPVC: datavol
    # Add a new copy method of DirectDelete or something
    copyMethod: DirectDelete
    # or a new option
    delete: true

What is the value to the end user? (why is it a priority?)

When you restore data to a PVC with existing data the existing data is not removed. This isn't desirable if you are restoring data and want the source data to match the destination data.

How will we know we have a good solution? (acceptance criteria)

If an option to remove data prior to a restore is implemented.

Additional context

Imagine if you destroy your cluster and want to recreate it. If using GitOps tools like Flux or Argo the bootstrap process usually involves spinning everything up in the cluster, which means applications will start and the PVCs will be tainted with data the application has written. An option to delete the data in the PVC would be excellent when you scale down the workload to restore data.

tesshuflower commented 2 months ago

We've been reluctant to widely support deleting temporary pvcs, because with other movers (such as rsync-tls where we expect to replicate many times on a schedule) having the data there is desirable so that the entire contents of the pvc do not need to be copied each time. If a user wants to make sure the pvc is gone entirely they can remove the replicationdestination or the pvc itself.

However, for a restic-specific scenario where you're doing a restore, it does make sense that if you want to restore again you may want to clean up files that were deleted at the source. It looks like restic has a new feature https://github.com/restic/restic/issues/2348 which we could possibly leverage. @onedr0p would this cover your scenario if we were to expose this feature when we upgrade to the latest restic (v0.17.0)?

One thing to keep in mind: this won't help you necessarily if you're using the volumepopulator as the pvc provisioned is a 1x operation (uses whatever is the latest snapshot from the replicationdestination at the time) and will not be updated if you run another sync in your replicationdestination.

onedr0p commented 2 months ago

I believe that restic feature would cover my use-case.

I haven't thought about this in awhile because the volume populator feature has pretty much killed the need to restore data over a PVC with existing data. With the volume populator, I just need to nuke the workload from my cluster and add it back and let volsync restore it.

onedr0p commented 2 months ago

This feature would still be useful for me in cases where I want to restore volumes adhoc without the volume populator.