ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

rbd: unable to create snapshots #3728

Closed Ramshield closed 1 year ago

Ramshield commented 1 year ago

Describe the bug

A clear and concise description of what the bug is.

Environment details

Steps to reproduce

Create volumesnapshotclass:

Create sc:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"meta.helm.sh/release-name":"cephrbd-storage","meta.helm.sh/release-namespace":"cephrbd-storage"},"labels":{"app":"ceph-csi-rbd","app.kubernetes.io/managed-by":"Helm","chart":"ceph-csi-rbd-3.8.0","heritage":"Helm","release":"cephrbd-storage"},"name":"test"},"parameters":{"clusterID":"3c9b7731-cd4f-4b47-9d7b-a37451a17773","csi.storage.k8s.io/controller-expand-secret-name":"csi-rbd-secret","csi.storage.k8s.io/controller-expand-secret-namespace":"cephrbd-storage","csi.storage.k8s.io/fstype":"ext4","csi.storage.k8s.io/node-stage-secret-name":"csi-rbd-secret","csi.storage.k8s.io/node-stage-secret-namespace":"cephrbd-storage","csi.storage.k8s.io/provisioner-secret-name":"csi-rbd-secret","csi.storage.k8s.io/provisioner-secret-namespace":"cephrbd-storage","imageFeatures":"layering","imageFormat":"2","monitors":"10.255.255.1:6789, 10.255.255.2:6789, 10.255.255.3:6789","pool":"test"},"provisioner":"rbd.csi.ceph.com","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
    meta.helm.sh/release-name: cephrbd-storage
    meta.helm.sh/release-namespace: cephrbd-storage
  creationTimestamp: "2023-03-26T12:52:48Z"
  labels:
    app: ceph-csi-rbd
    app.kubernetes.io/managed-by: Helm
    chart: ceph-csi-rbd-3.8.0
    heritage: Helm
    release: cephrbd-storage
  name: test
  resourceVersion: "39986359"
  uid: 11ae7979-03a2-4dc0-a759-e062147ff807
parameters:
  clusterID: 3c9b7731-cd4f-4b47-9d7b-a37451a17773
  csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: cephrbd-storage
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-namespace: cephrbd-storage
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/provisioner-secret-namespace: cephrbd-storage
  imageFeatures: layering
  imageFormat: "2"
  monitors: 10.255.255.1:6789, 10.255.255.2:6789, 10.255.255.3:6789
  pool: test
provisioner: rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-rbdplugin-snapclass
driver: rbd.csi.ceph.com
parameters:
  # String representing a Ceph cluster to provision storage snapshot from.
  # Should be unique across all Ceph clusters in use for provisioning,
  # cannot be greater than 36 bytes in length, and should remain immutable for
  # the lifetime of the StorageClass in use.
  # Ensure to create an entry in the configmap named ceph-csi-config, based on
  # csi-config-map-sample.yaml, to accompany the string chosen to
  # represent the Ceph cluster in clusterID below
  clusterID: 3c9b7731-cd4f-4b47-9d7b-a37451a17773

  # Prefix to use for naming RBD snapshots.
  # If omitted, defaults to "csi-snap-".
  # snapshotNamePrefix: "foo-bar-"

  csi.storage.k8s.io/snapshotter-secret-name: csi-rbd-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: cephrbd-storage
deletionPolicy: Delete

Create PVC:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: test

Create snapshot:

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: rbd-pvc-snapshot
spec:
  volumeSnapshotClassName: csi-rbdplugin-snapclass
  source:
    persistentVolumeClaimName: rbd-pvc

Actual results

Nothing... Nothing happens.

Expected behavior

A snapshot.

Logs

root@jumphost-01:~/k8s# kubectl -n cephrbd-storage logs pods/cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds -c csi-snapshotter
I0326 11:25:30.576234       1 main.go:104] Version: v6.1.0
I0326 11:25:31.586137       1 common.go:111] Probing CSI driver for readiness
I0326 11:25:31.588945       1 leaderelection.go:248] attempting to acquire leader lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com...
I0326 11:25:50.902288       1 leaderelection.go:258] successfully acquired lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com
I0326 11:25:50.902650       1 snapshot_controller_base.go:133] Starting CSI snapshotter
I0326 13:07:58.856702       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0326 13:07:58.856705       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
root@jumphost-01:~/k8s# kubectl -n cephrbd-storage logs cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-kd7qb -c csi-snapshotter
I0326 11:25:35.748929       1 main.go:104] Version: v6.1.0
I0326 11:25:36.753173       1 common.go:111] Probing CSI driver for readiness
I0326 11:25:36.755953       1 leaderelection.go:248] attempting to acquire leader lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com...
root@jumphost-01:~/k8s# kubectl -n cephrbd-storage logs cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-msj7c -c csi-snapshotter
I0326 11:25:40.730390       1 main.go:104] Version: v6.1.0
I0326 11:25:41.736757       1 common.go:111] Probing CSI driver for readiness
I0326 11:25:41.738431       1 leaderelection.go:248] attempting to acquire leader lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com...

ccsi-rbdplugin: https://pastebin.com/MLamjf8Y

Additional context

Even though logs complain about some empty Secret, it is there and filled in?

# kubectl get secrets csi-rbd-secret -n cephrbd-storage -o yaml
apiVersion: v1
data:
  encryptionPassphrase: <SNIP>
  userID: <SNIP>
  userKey: <SNIP>
kind: Secret
metadata:
  annotations:
    meta.helm.sh/release-name: cephrbd-storage
    meta.helm.sh/release-namespace: cephrbd-storage
  creationTimestamp: "2023-03-20T22:51:21Z"
  labels:
    app: ceph-csi-rbd
    app.kubernetes.io/managed-by: Helm
    chart: ceph-csi-rbd-3.8.0
    heritage: Helm
    release: cephrbd-storage
  name: csi-rbd-secret
  namespace: cephrbd-storage
  resourceVersion: "37421186"
  uid: 347708bc-0a0a-44cb-a537-2590059bb922
type: Opaque

Running k3s:

[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
        '--server' \
        'https://k8s-master<SNIP>:6443' \
        '--disable' \
        'traefik' \
        '--disable=servicelb' \
        '--disable=local-storage'
nixpanic commented 1 year ago

The logs you added are from an external-snapshotter that is not active. Please check the leases in the namespace where you deployed the provisioner, and provide the logs of the active external-snapshotter container.

The logs from the cephcsi-rbd-plugin do not show any gRPC requests for CreateSnapshot. That means it are the logs from a wrong (inactive) provisioner, or snapshot creation is not reaching the provisioner (potentially stuck, or in error at the external-snapshotter).

Ramshield commented 1 year ago

@nixpanic Can you please let me know what else to install? As all I installed was ceph-csi-rbd helm chart:

# kubectl get pods -A
NAMESPACE                NAME                                                        READY   STATUS    RESTARTS        AGE
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-nodeplugin-km2sf               3/3     Running   3 (4d1h ago)    6d18h
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-nodeplugin-mb9v4               3/3     Running   3 (4d1h ago)    6d18h
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-nodeplugin-wjj9d               3/3     Running   3 (4d1h ago)    6d18h
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds   7/7     Running   0               30h
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-kd7qb   7/7     Running   0               30h
cephrbd-storage          cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-msj7c   7/7     Running   0               30h
kube-system              coredns-597584b69b-jtr6b                                    1/1     Running   3 (4d1h ago)    34d
kube-system              metrics-server-5c8978b444-hgzdr                             1/1     Running   3 (4d1h ago)    34d
metallb-system           metallb-controller-7898b886f6-k4ft2                         1/1     Running   3 (4d1h ago)    36d
metallb-system           metallb-speaker-25zkk                                       1/1     Running   3 (4d1h ago)    36d
metallb-system           metallb-speaker-cgtvd                                       1/1     Running   3 (4d1h ago)    36d
metallb-system           metallb-speaker-hz6dm                                       1/1     Running   3 (4d1h ago)    36d
nginx-ingress            nginx-ingress-nginx-ingress-56f6f8d48c-4mcf9                1/1     Running   0               30h

What step did I miss in the README? Thank you!

nixpanic commented 1 year ago

I don't think you need to install more. You would need to check the right logs to see what the problem could be.

Ramshield commented 1 year ago

@nixpanic Which ones? the csi-rbdplugin logs I sent are the only ones that actually had something in them. Or do you want me to request all the logs from all the pods..

nixpanic commented 1 year ago

you can use kubectl -n cephrbd-storage get leases and see which pod/container is active for a certain task. The external-snapshotter container should have a lease, and that rbd-provisioner should have the logs that provides a hint on what is failing.

Ramshield commented 1 year ago

Same result, nothing in the logs...

root@jumphost-01:~# kubectl -n cephrbd-storage get leases
NAME                                           HOLDER                                                                                           AGE
external-attacher-leader-rbd-csi-ceph-com      cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds                                        7d19h
external-resizer-rbd-csi-ceph-com              cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds                                        7d19h
external-snapshotter-leader-rbd-csi-ceph-com   cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds                                        7d19h
rbd-csi-ceph-com                               1679829931631-8081-rbd-csi-ceph-com                                                              7d19h
rbd.csi.ceph.com-cephrbd-storage               cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds_51b31664-89bb-4f1e-8f24-73ebe84a7ebe   7d19h
root@jumphost-01:~# kubectl -n cephrbd-storage logs pods/cephrbd-storage-ceph-csi-rbd-provisioner-7c74444cc5-cpdds -c csi-snapshotter
I0326 11:25:30.576234       1 main.go:104] Version: v6.1.0
I0326 11:25:31.586137       1 common.go:111] Probing CSI driver for readiness
I0326 11:25:31.588945       1 leaderelection.go:248] attempting to acquire leader lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com...
I0326 11:25:50.902288       1 leaderelection.go:258] successfully acquired lease cephrbd-storage/external-snapshotter-leader-rbd-csi-ceph-com
I0326 11:25:50.902650       1 snapshot_controller_base.go:133] Starting CSI snapshotter
I0326 13:07:58.856702       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
I0326 13:07:58.856705       1 streamwatcher.go:111] Unexpected EOF during watch stream event decoding: unexpected EOF
Madhu-1 commented 1 year ago

@Ramshield I see you dont have snapshot controller running did you follow the steps mentioned here https://github.com/kubernetes-csi/external-snapshotter/#usage

Ramshield commented 1 year ago

@Madhu-1 Thank you very much, that fixed it! :) Much appreciated!