Configure backup policies

clincha commented 1 year ago

When I have a backup solution in place:

I want to have sensible backup policies so that I don't spend too much money
I want to recover services with as little data loss as possible so that I don't have to redo much work if the cluster goes down

clincha commented 1 year ago

Let's get Velero backing up into Azure

clincha commented 1 year ago

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts

kubectl create ns velero

helm install velero vmware-tanzu/velero --namespace velero -f velero-values.yml

I hit this error. Must have updated since the last time I used it.

[kubernetes@bri-master-1 ~]$ helm install velero vmware-tanzu/velero --namespace velero -f velero-values.yml
coalesce.go:223: warning: destination for velero.configuration.volumeSnapshotLocation is a table. Ignoring non-table value ([map[config:map[] name:<nil> provider:<nil>]])
coalesce.go:223: warning: destination for velero.configuration.backupStorageLocation is a table. Ignoring non-table value ([map[accessMode:ReadWrite bucket:<nil> caCert:<nil> config:map[] credential:map[key:<nil> name:<nil>] default:<nil> name:<nil> prefix:<nil> provider:<nil>]])
Error: INSTALLATION FAILED: execution error at (velero/templates/NOTES.txt:78:4): 

#################################################################################
######   BREAKING: The config values passed contained no longer accepted    #####
######             options. See the messages below for more details.        #####
######                                                                      #####
######             To verify your updated config is accepted, you can use   #####
######             the `helm template` command.                             #####
#################################################################################

ERROR: Please make .configuration.backupStorageLocation from map to slice

ERROR: Please make .configuration.volumeSnapshotLocation from map to slice

clincha commented 1 year ago

The issue was in the configuration block. The format changed from the previous versions. Now the backupStorageLocation and volumeSnapshotLocation take an array.

configuration:
  backupStorageLocation:
    - name: velero
      provider: azure
      default: true
      bucket: bristol
      accessMode: ReadWrite
      config:
        resourceGroup: velero
        subscriptionId: 2c3c5e03-9427-4fd2-9476-dee0f754b964
        storageAccount: acvelerotest
  volumeSnapshotLocation:
    - name: velero
      provider: azure
      config:
        resourceGroup: velero
        subscriptionId: 2c3c5e03-9427-4fd2-9476-dee0f754b964
  defaultVolumesToFsBackup: true

clincha commented 1 year ago

Backups are working now but the pv backup is failing. The error looks to be that there is no snapshot class defined. Not sure how I do that but I'm sure I can try.

time="2023-07-01T17:12:09Z" level=error msg="Error backing up item" backup=velero/factorio-t1 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=default, name=factorio-claim): rpc error: code = Unknown desc = failed to get volumesnapshotclass for storageclass csi-rbd-sc-ssd: error listing volumesnapshot classes: the server could not find the requested resource (get volumesnapshotclasses.snapshot.storage.k8s.io)" logSource="pkg/backup/backup.go:435" name=factorio-558fbc54f4-9ht58

clincha commented 1 year ago

I not only needed the snapshot class but I also needed the CRDs for the snapshot Kubernetes feature.

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml

The class looks like this:

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-rbdplugin-snapclass
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: rbd.csi.ceph.com
parameters:
  # String representing a Ceph cluster to provision storage snapshot from.
  # Should be unique across all Ceph clusters in use for provisioning,
  # cannot be greater than 36 bytes in length, and should remain immutable for
  # the lifetime of the StorageClass in use.
  # Ensure to create an entry in the configmap named ceph-csi-config, based on
  # csi-config-map-sample.yaml, to accompany the string chosen to
  # represent the Ceph cluster in clusterID below
  clusterID: "c61ed9ad-0a71-4f66-8d6a-b76fd0a47798"

  # Prefix to use for naming RBD snapshots.
  # If omitted, defaults to "csi-snap-".
  # snapshotNamePrefix: "foo-bar-"

  csi.storage.k8s.io/snapshotter-secret-name: csi-rbd-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: ceph-csi-rbd
deletionPolicy: Delete

clincha commented 1 year ago

Tested using Factorio. The restore worked the first time but he next time I had an I/O error when I tried to restore. The functionality worked well I just need to be careful to make sure the backups aren't corrupted before I delete anything.

clincha-org / clincha

Configure backup policies #83