DataONEorg / k8s-cluster

Documentation on the DataONE Kubernetes cluster
Apache License 2.0
2 stars 1 forks source link

Enable CSI snapshots #41

Open nickatnceas opened 1 year ago

nickatnceas commented 1 year ago

Velero is failing to complete backups during the initial backup runs in #37. It appears that this is due to lack of snapshot support:

outin@halt:~/velero/velero-v1.12.0-darwin-amd64$ velero backup describe backup-full
Name:         backup-full
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.22.0
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=22

Phase:  PartiallyFailed (run `velero backup logs backup-full` for more information)
outin@halt:~/velero/velero-v1.12.0-darwin-amd64$ velero backup logs backup-full | grep -v level=info
time="2023-09-26T21:33:35Z" level=error msg="no matches for kind \"VolumeSnapshotContent\" in version \"snapshot.storage.k8s.io/v1\"" backup=velero/backup-full logSource="pkg/controller/backup_controller.go:676"
outin@halt:~/velero/velero-v1.12.0-darwin-amd64$
outin@halt:~/velero/velero-v1.12.0-darwin-amd64$ kubectl get volumesnapshotclass
error: the server doesn't have a resource type "volumesnapshotclass"
outin@halt:~/velero/velero-v1.12.0-darwin-amd64$ kubectl get volumesnapshot
error: the server doesn't have a resource type "volumesnapshot"

Snapshot support should be enabled on both K8s dev and prod, and can be enabled with the instructions in https://github.com/ceph/ceph-csi/blob/devel/docs/snap-clone.md

nickatnceas commented 1 year ago

I used ceph-csi 3.7.2, which is compatible with K8s 1.23:

Per the linked docs, I installed the snapshot controller:

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2# ./scripts/install-snapshot.sh install
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2636  100  2636    0     0  56085      0 --:--:-- --:--:-- --:--:-- 56085
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1296  100  1296    0     0  43200      0 --:--:-- --:--:-- --:--:-- 43200
serviceaccount/snapshot-controller created
clusterrole.rbac.authorization.k8s.io/snapshot-controller-runner created
clusterrolebinding.rbac.authorization.k8s.io/snapshot-controller-role created
role.rbac.authorization.k8s.io/snapshot-controller-leaderelection created
rolebinding.rbac.authorization.k8s.io/snapshot-controller-leaderelection created
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/csi-snapshotter-psp created
role.rbac.authorization.k8s.io/csi-snapshotter-psp created
rolebinding.rbac.authorization.k8s.io/csi-snapshotter-psp created
deployment.apps/snapshot-controller created
customresourcedefinition.apiextensions.k8s.io/volumesnapshotclasses.snapshot.storage.k8s.io created
customresourcedefinition.apiextensions.k8s.io/volumesnapshotcontents.snapshot.storage.k8s.io created
customresourcedefinition.apiextensions.k8s.io/volumesnapshots.snapshot.storage.k8s.io created
snapshotter pod status: true
snapshot controller creation successful

Created the RBD snap resources:

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/rbd# vim snapshotclass.yaml

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/rbd# kubectl create -f snapshotclass.yaml
volumesnapshotclass.snapshot.storage.k8s.io/csi-rbdplugin-snapclass created

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/rbd# kubectl get volumesnapshotclass
NAME                      DRIVER             DELETIONPOLICY   AGE
csi-rbdplugin-snapclass   rbd.csi.ceph.com   Delete           53s

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/rbd# kubectl create -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/rbd-pvc-snapshot created

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/rbd# kubectl get volumesnapshot
NAME               READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS             SNAPSHOTCONTENT   CREATIONTIME   AGE
rbd-pvc-snapshot   false        rbd-pvc                                           csi-rbdplugin-snapclass                                    12m

READYTOUSE is not switching to true. Need to figure out why before continuing.

nickatnceas commented 1 year ago

I continued on , creating CephFS snap resource:

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/cephfs# vim snapshotclass.yaml

root@docker-dev-ucsb-1:/home/outin/in/ceph-csi-3.7.2/examples/cephfs# kubectl create -f snapshotclass.yaml
volumesnapshotclass.snapshot.storage.k8s.io/csi-cephfsplugin-snapclass created

I created a new SOURCEPVC RBD and tried snapshotting it according to docs, but was unsuccessful. However, Velero is now able to complete a backup successfully, so it appears that snapshots are working.

nickatnceas commented 1 year ago

Reopening as I didn't deploy this to production yet.

nickatnceas commented 11 months ago

I created a successful test RBD snapshot:

snapshot.yaml

---
# Snapshot API version compatibility matrix:
# v1beta1:
#   v1.17 =< k8s < v1.20
#   2.x =< snapshot-controller < v4.x
# v1:
#   k8s >= v1.20
#   snapshot-controller >= v4.x
# We recommend to use {sidecar, controller, crds} of same version
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: rbd-pvc-snapshot-test-2
spec:
  volumeSnapshotClassName: csi-rbdplugin-snapclass
  source:
    persistentVolumeClaimName: test-pvc
outin@halt:~/k8s$ kubectl create -f snapshot.yaml -n nick

outin@halt:~/k8s$ kubectl get volumesnapshot -A
NAMESPACE   NAME                      READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS             SNAPSHOTCONTENT                                    CREATIONTIME   AGE
default     rbd-pvc-snapshot          false        rbd-pvc                                           csi-rbdplugin-snapclass                                                                     21d
default     rbd-pvc-snapshot-test     false        test-pvc                                          csi-rbdplugin-snapclass                                                                     19d
nick        rbd-pvc-snapshot-test-2   true         test-pvc                            10Gi          csi-rbdplugin-snapclass   snapcontent-15ce6684-8b99-4af5-ba23-a2b8de3fde86   20s            24s
nickatnceas commented 11 months ago

I believe I am not able to deploy CephFS due to the lack of CephFS credentials / auto provisioning in our setup. I'm looking into this issue in #42

nickatnceas commented 11 months ago

Now that dynamic provisioning of CephFS volumes is working I'm able to create CephFS volume snapshots, the following is working on k8s-dev:

cephfs-snapshotclass.yaml

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-cephfsplugin-snapclass
driver: cephfs.csi.ceph.com
parameters:
  clusterID: 8aa4d4a0-a209-11ea-baf5-ffc787bfc812
  snapshotNamePrefix: "k8s-dev-csi-snap-"
  csi.storage.k8s.io/snapshotter-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: default
deletionPolicy: Delete

cephfs-pvc-snapshot.yaml

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: cephfs-pvc-snapshot-test-2
  namespace: nick
spec:
  volumeSnapshotClassName: csi-cephfsplugin-snapclass
  source:
    persistentVolumeClaimName: csi-cephfs-pvc-test-12
outin@halt:~/k8s$ kubectl create -f cephfs-pvc-snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/cephfs-pvc-snapshot-test-2 created

outin@halt:~/k8s$ kubectl get volumesnapshot -n nick
NAME                         READYTOUSE   SOURCEPVC                SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                SNAPSHOTCONTENT                                    CREATIONTIME   AGE
cephfs-pvc-snapshot-test-2   true         csi-cephfs-pvc-test-12                           10Gi          csi-cephfsplugin-snapclass   snapcontent-06183985-06f5-47e2-83a1-81aac6aeab47   6s             8s

Next up is to configure and test CSI RBD and CephFS snapshots on k8s-prod.