IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

logs are flooding in snapshotter and driver pod for snapwindow expire wait in volumesnapshot #664

Closed vrushch closed 7 months ago

vrushch commented 2 years ago

Describe the bug

When volumesnapshot is created and snapwindow is not expired, it is printing large number of logs messages in snapshotter and driver pod.

How to Reproduce?

  1. Create sc, pvc1 , volumesnapshotclass vsc and volumesnapshot vs1 in namespace ns1
  2. create pvc2 of sc and volumesnapshot vs2 of pvc2
  3. check snapshotter and driver pod logs till snapwindow expires

Expected behavior

Logs should be manageable in number , even for one volumesnapshot pending in 10 minutes it shows

[root@scale-csi-master snapshot]# kubectl logs ibm-spectrum-scale-csi-snapshotter-0 | grep "Wait till current snapWindow expires" | wc -l
1357
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# kubectl logs ibm-spectrum-scale-csi-wmzm8 | grep "Wait till current snapWindow expires" | wc -l
491
[root@scale-csi-master snapshot]#

Data Collection and Debugging

Environmental output

[root@scale-csi-master functional-tests]# kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:32:41Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
[root@scale-csi-master functional-tests]#
[root@scale-csi-master functional-tests]# kubectl get pods -o yaml | grep quay
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.5.0-070322
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.5.0-070322
      imageID: docker-pullable://quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:c61b14b2dd3a2ac7c27187019acf97eb3bae57c1f67a8cea7f8ac069519f9b53
        value: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.5.0-070322
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:v2.5.0-070322
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:v2.5.0-070322
      imageID: docker-pullable://quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:241ab702e62c087b99eb5f2416675bc09b0f1d96365c34b3f292a6ca1429a9e7
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.5.0-070322
      image: quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.5.0-070322
      imageID: docker-pullable://quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:c61b14b2dd3a2ac7c27187019acf97eb3bae57c1f67a8cea7f8ac069519f9b53
[root@scale-csi-master functional-tests]#

Additional context

[root@scale-csi-master snapshot]# cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-csi-advance
provisioner: spectrumscale.csi.ibm.com
parameters:
   volBackendFs: "fs1"
   version: "2"
reclaimPolicy: Delete

[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl apply -f storageclass.yaml
storageclass.storage.k8s.io/ibm-spectrum-scale-csi-advance created
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# cat pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scale-advance-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ibm-spectrum-scale-csi-advance
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl apply -f pvc.yaml
persistentvolumeclaim/scale-advance-pvc created
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl get pvc
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     AGE
scale-advance-pvc   Bound    pvc-282f44b2-6f47-4263-948f-2a26700414c0   1Gi        RWX            ibm-spectrum-scale-csi-advance   20s
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# cat volsnapclass.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ibm-spectrum-scale-snapshotclass-advance-30
driver: spectrumscale.csi.ibm.com
parameters:
  snapWindow: "30"  #Optional : Time in minutes (default=30)
deletionPolicy: Delete
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl apply -f volsnapclass.yaml
volumesnapshotclass.snapshot.storage.k8s.io/ibm-spectrum-scale-snapshotclass-advance-30 created
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# cat volsnap.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: ibm-spectrum-scale-snapshot-1
spec:
  volumeSnapshotClassName: ibm-spectrum-scale-snapshotclass-advance-30
  source:
    persistentVolumeClaimName: scale-advance-pvc
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl apply -f volsnap.yaml
volumesnapshot.snapshot.storage.k8s.io/ibm-spectrum-scale-snapshot-1 created
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl get volumesnapshot
NAME                            READYTOUSE   SOURCEPVC           SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                                 SNAPSHOTCONTENT                                    CREATIONTIME   AGE
ibm-spectrum-scale-snapshot-1   true         scale-advance-pvc                           1Gi           ibm-spectrum-scale-snapshotclass-advance-30   snapcontent-60d37d47-132b-4794-a86a-b5038c6d2a40   9s             9s
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# cat pvc2.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scale-advance-pvc-2
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ibm-spectrum-scale-csi-advance
[root@scale-csi-master snapshot]# kubectl apply -f pvc2.yaml
persistentvolumeclaim/scale-advance-pvc-2 created
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl get pvc
NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     AGE
scale-advance-pvc     Bound    pvc-282f44b2-6f47-4263-948f-2a26700414c0   1Gi        RWX            ibm-spectrum-scale-csi-advance   9m45s
scale-advance-pvc-2   Bound    pvc-f4979515-7674-459f-a357-bbc2caaad42f   1Gi        RWX            ibm-spectrum-scale-csi-advance   4m11s
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# cat volsnap2.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: ibm-spectrum-scale-snapshot-2
spec:
  volumeSnapshotClassName: ibm-spectrum-scale-snapshotclass-advance-30
  source:
    persistentVolumeClaimName: scale-advance-pvc-2
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl apply -f volsnap2.yaml
volumesnapshot.snapshot.storage.k8s.io/ibm-spectrum-scale-snapshot-2 created
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl get volumesnapshot
NAME                            READYTOUSE   SOURCEPVC             SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                                 SNAPSHOTCONTENT                                    CREATIONTIME   AGE
ibm-spectrum-scale-snapshot-1   true         scale-advance-pvc                             1Gi           ibm-spectrum-scale-snapshotclass-advance-30   snapcontent-60d37d47-132b-4794-a86a-b5038c6d2a40   13m            13m
ibm-spectrum-scale-snapshot-2   false        scale-advance-pvc-2                                         ibm-spectrum-scale-snapshotclass-advance-30   snapcontent-c15148e3-62fb-424b-af0d-92eb11f15114                  6m32s
[root@scale-csi-master snapshot]#

This logs number is for single volumesnapshot in around 8 minute timeframe
[root@scale-csi-master snapshot]# kubectl logs ibm-spectrum-scale-csi-snapshotter-0 | grep ibm-spectrum-scale-snapshot-2 | wc -l
191
[root@scale-csi-master snapshot]#
[root@scale-csi-master snapshot]# kubectl logs ibm-spectrum-scale-csi-snapshotter-0 | grep "Wait till current snapWindow expires" | wc -l
1357
[root@scale-csi-master snapshot]#

[root@scale-csi-master snapshot]# kubectl logs ibm-spectrum-scale-csi-wmzm8 | grep "Wait till current snapWindow expires" | wc -l
491
[root@scale-csi-master snapshot]#

Message type in snapshooter logs

I0307 09:31:19.361678       1 snapshot_controller.go:628] setAnnVolumeSnapshotBeingCreated: volume snapshot content &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:snapcontent-c15148e3-62fb-424b-af0d-92eb11f15114 GenerateName: Namespace: SelfLink: UID:07a813ac-e06b-44b6-a1f6-01862c7bf628 ResourceVersion:23714022 Generation:1 CreationTimestamp:2022-03-07 09:24:33 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[snapshot.storage.kubernetes.io/volumesnapshot-being-created:yes] OwnerReferences:[] Finalizers:[snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection] ClusterName: ManagedFields:[{Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2022-03-07 09:24:33 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:finalizers":{".":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection\"":{}}},"f:spec":{".":{},"f:deletionPolicy":{},"f:driver":{},"f:source":{".":{},"f:volumeHandle":{}},"f:volumeSnapshotClassName":{},"f:volumeSnapshotRef":{".":{},"f:apiVersion":{},"f:kind":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} Subresource:} {Manager:csi-snapshotter Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2022-03-07 09:24:36 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{".":{},"f:error":{".":{},"f:message":{},"f:time":{}},"f:readyToUse":{}}} Subresource:status} {Manager:csi-snapshotter Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2022-03-07 09:31:19 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:annotations":{".":{},"f:snapshot.storage.kubernetes.io/volumesnapshot-being-created":{}}}} Subresource:}]} Spec:{VolumeSnapshotRef:{Kind:VolumeSnapshot Namespace:ibm-spectrum-scale-csi-driver Name:ibm-spectrum-scale-snapshot-2 UID:c15148e3-62fb-424b-af0d-92eb11f15114 APIVersion:snapshot.storage.k8s.io/v1 ResourceVersion:23712498 FieldPath:} DeletionPolicy:Delete Driver:spectrumscale.csi.ibm.com VolumeSnapshotClassName:0xc0000b75a0 Source:{VolumeHandle:0xc0000b7590 SnapshotHandle:<nil>}} Status:0xc00024d680}

E0307 09:32:07.583759       1 snapshot_controller.go:124] checkandUpdateContentStatus [snapcontent-c15148e3-62fb-424b-af0d-92eb11f
15114]: error occurred failed to take snapshot of the volume 1;1;14693703847104118025;8D040B0A:616E7DCB;6427a668-4e39-4bbd-bacb-3c
c0c3468bd4-ibm-spectrum-scale-csi-driver;pvc-f4979515-7674-459f-a357-bbc2caaad42f;/ibm/fs1/6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ib
m-spectrum-scale-csi-driver/pvc-f4979515-7674-459f-a357-bbc2caaad42f: "rpc error: code = Internal desc = snapshot for volume [pvc-
f4979515-7674-459f-a357-bbc2caaad42f] in filesystem [fs1] is not taken. Wait till current snapWindow expires."
E0307 09:32:07.583770       1 snapshot_controller_base.go:265] could not sync content "snapcontent-c15148e3-62fb-424b-af0d-92eb11f15114": failed to take snapshot of the volume 1;1;14693703847104118025;8D040B0A:616E7DCB;6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver;pvc-f4979515-7674-459f-a357-bbc2caaad42f;/ibm/fs1/6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver/pvc-f4979515-7674-459f-a357-bbc2caaad42f: "rpc error: code = Internal desc = snapshot for volume [pvc-f4979515-7674-459f-a357-bbc2caaad42f] in filesystem [fs1] is not taken. Wait till current snapWindow expires."
I0307 09:32:07.583784       1 snapshot_controller_base.go:167] Failed to sync content "snapcontent-c15148e3-62fb-424b-af0d-92eb11f15114", will retry again: failed to take snapshot of the volume 1;1;14693703847104118025;8D040B0A:616E7DCB;6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver;pvc-f4979515-7674-459f-a357-bbc2caaad42f;/ibm/fs1/6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver/pvc-f4979515-7674-459f-a357-bbc2caaad42f: "rpc error: code = Internal desc = snapshot for volume [pvc-f4979515-7674-459f-a357-bbc2caaad42f] in filesystem [fs1] is not taken. Wait till current snapWindow expires."

Message type in driver pods on node same as snapshotter pod

E0307 09:40:45.449343       1 rest_v2.go:129] Async Job failed: {{200 The request finished successfully.} [{{[] [] 6 [EFSSG0264C The path /ibm/fs1/6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver/.snapshots/snapshot-60d37d47-132b-4794-a86a-b5038c6d2a40/pvc-f4979515-7674-459f-a357-bbc2caaad42f does not exist.] []} {GET /scalemgmt/v2/filesystems/fs1/directory/6427a668-4e39-4bbd-bacb-3cc0c3468bd4-ibm-spectrum-scale-csi-driver%2F.snapshots%2Fsnapshot-60d37d47-132b-4794-a86a-b5038c6d2a40%2Fpvc-f4979515-7674-459f-a357-bbc2caaad42f map[]} 1000000018878 2022-03-07 01:40:43,426 2022-03-07 01:40:43,455 FAILED}]}
E0307 09:40:45.449396       1 utils.go:55] GRPC error: rpc error: code = Internal desc = snapshot for volume [pvc-f4979515-7674-459f-a357-bbc2caaad42f] in filesystem [fs1] is not taken. Wait till current snapWindow expires.
badri-pathak commented 8 months ago

Verified with the latest build 2.11.0, issue have been either fixed with the successive releases or not able to reproduce in the cluster.

image
Jainbrt commented 8 months ago

@saurabhwani5 can you please verify this and close if not getting reproduced ?

saurabhwani5 commented 8 months ago

issue is still reproducible

Message:     Failed to create snapshot: failed to take snapshot of the volume 1;1;14016324136648177722;BB4A0B0A:65A5F92B;28b3ad9e-0358-48a2-9ba0-db57d885c621-ibm-spectrum-scale-csi-driver;pvc-365b9185-758e-4044-8496-94085eec80a7;/ibm/fs1/28b3ad9e-0358-48a2-9ba0-db57d885c621-ibm-spectrum-scale-csi-driver/pvc-365b9185-758e-4044-8496-94085eec80a7: "rpc error: code = Internal desc = snapshot for volume [pvc-365b9185-758e-4044-8496-94085eec80a7] in filesystem [fs1] is not taken. Wait till current snapWindow expires."
deeghuge commented 7 months ago

@saurabhwani5 do you see flooding of messages ?

saurabhwani5 commented 7 months ago

closing as I don't see flooded logs