ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

Ceph-csi-rbd snapshot creationtime showing as invalid in kubectl get volumesnapshot commands #2863

Closed msfrucht closed 2 years ago

msfrucht commented 2 years ago

Describe the bug

After performing a snapshot of ceph-csi-rbd volumes using ceph-csi 3.5 any "kubectl get volumesnapshot" command shows the creationtime column as "\<invalid>". The snapshot appears to be successfully taken.

Environment details

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:35:25Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

Steps to reproduce the behavior:

  1. Setup ceph-csi-rbd

  2. Create storage class and snapshot class

  3. Create Ceph-csi-rbd PVC

  4. Create VolumeSnapshot

  5. Setup details: '...'

  6. Deployment to trigger the issue '....'

  7. See error

Actual results

Snapshots are created with creationtime columns when queried from kubectl.

Expected behavior

Snapshots should show the creation time as appears in the status field.

Logs

[michael@hyperdevvm1a1 rbd]$ kubectl get volumesnapshot
NAME                   READYTOUSE   SOURCEPVC              SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                    SNAPSHOTCONTENT                                    CREATIONTIME   AGE
ceph-csi-rbd-35-test   true         ceph-csi-rbd-35-test                           1Gi           cirrus-csi-rbdplugin-snapclass   snapcontent-aea7d98c-8cb2-40ce-95f1-67088f683d31   <invalid>      80s
[michael@hyperdevvm1a1 rbd]$ kubectl get volumesnapshot ceph-csi-rbd-35-test -o yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"snapshot.storage.k8s.io/v1","kind":"VolumeSnapshot","metadata":{"annotations":{},"name":"ceph-csi-rbd-35-test","namespace":"default"},"spec":{"source":{"persistentVolumeClaimName":"ceph-csi-rbd-35-test"},"volumeSnapshotClassName":"cirrus-csi-rbdplugin-snapclass"}}
  creationTimestamp: "2022-02-04T19:58:34Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-bound-protection
  generation: 1
  name: ceph-csi-rbd-35-test
  namespace: default
  resourceVersion: "47122063"
  uid: aea7d98c-8cb2-40ce-95f1-67088f683d31
spec:
  source:
    persistentVolumeClaimName: ceph-csi-rbd-35-test
  volumeSnapshotClassName: cirrus-csi-rbdplugin-snapclass
status:
  boundVolumeSnapshotContentName: snapcontent-aea7d98c-8cb2-40ce-95f1-67088f683d31
  creationTime: "2022-02-04T20:03:17Z"
  readyToUse: true
  restoreSize: 1Gi

If the issue is in snapshot creation and deletion please attach complete logs of below containers.

ceph-csi-snapshotter.txt

csi-rbdplugin.txt

Additional context

Madhu-1 commented 2 years ago

@msfrucht I don't see any errors in the snapshotter logs. can you please check you are using the same version of cs-snapshotter and the snapshot controller in the cluster?

msfrucht commented 2 years ago

snapshot-controller pod yaml shows: image: k8s.gcr.io/sig-storage/snapshot-controller:v4.2.0

csi-snapshotter in ceph-csi-rbd-provisioner pods: image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0

Madhu-1 commented 2 years ago

@msfrucht i tested with same version am not seeing anything wrong.

$ kubectl get po/snapshot-controller-bb7675d55-fwgk6 -nkube-system -oyaml |grep -i snapshot-controller:
    image: k8s.gcr.io/sig-storage/snapshot-controller:v4.2.0
    image: k8s.gcr.io/sig-storage/snapshot-controller:v4.2.0

$ kuberc get po csi-rbdplugin-provisioner-569f64b6fd-l6mlf -oyaml |grep -i csi-snapshotter:
    image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
    image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0

$ kuberc get po csi-rbdplugin-provisioner-569f64b6fd-l6mlf -oyaml |grep -i cephcsi/cephcsi
    image: quay.io/cephcsi/cephcsi:v3.5.1
    image: quay.io/cephcsi/cephcsi:v3.5.1

$ kubectl get volumesnapshotsts
NAME                 READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS             SNAPSHOTCONTENT                                    CREATIONTIME   AGE
rbd-pvc-snapshot     true         rbd-pvc                             1Gi           csi-rbdplugin-snapclass   snapcontent-7b8686ff-015c-4780-a32e-63d8ad56a776   24h            24h
rbd-pvc-snapshot-1   true         rbd-pvc                             1Gi           csi-rbdplugin-snapclass   snapcontent-034d97ac-565d-40c4-8d12-2d864ef063e1   2m42s          2m43s

can you please open an issue with https://github.com/kubernetes-csi/external-snapshotter and check anything wrong with snapshotter?

Madhu-1 commented 2 years ago

also check are there any failures in snapshot-controller pod logs.

msfrucht commented 2 years ago

Yes, there are failures in the snapshot-controller logs and significant number of pod restarts. We've been having off-and-on brownout issues with network connectivity the past few weeks that last fractions to several seconds at a time in waves.

W0211 02:45:57.583344 1 reflector.go:441] github.com/kubernetes-csi/external-snapshotter/client/v4/informers/externalversions/factory.go:117: watch of *v1.VolumeSnapshotClass ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding

snapshot-controller-bb7675d55-b6gjb                            1/1     Running   223 (14h ago)   90d
snapshot-controller-bb7675d55-mrbnt                            1/1     Running   216 (15h ago)   90d

And it seems to have resolved itself at some point.

kubectl get volumesnapshot
NAME                   READYTOUSE   SOURCEPVC              SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                    SNAPSHOTCONTENT                                    CREATIONTIME   AGE
ceph-csi-rbd-35-test   true         ceph-csi-rbd-35-test                           1Gi           cirrus-csi-rbdplugin-snapclass   snapcontent-aea7d98c-8cb2-40ce-95f1-67088f683d31   6d21h          6d21h

Looks like this is a combination environmental-issue and snapshot-controller issue. Not ceph-csi issue. Closing.