IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

race condition happens when shallow copy pvc and snapshot are deleted together #1105

Closed saurabhwani5 closed 7 months ago

saurabhwani5 commented 7 months ago

Describe the bug

When shallow copy volume and snapshot are deleted together in that scenario race condition happens where in some cases pv does not get deleted of shallow copy volume As per my understanding this is happening because of following reason :

when we delete shallow copy volume and snapshot together , shallow copy volume deletes the shallow copy directory created in snapshot directory and when snapshot tries to delete then it got deleted as there is no shallow copy directory But in shallow copy deletion snapshot directory path is checked afterwards then it gives error as following : Warning VolumeFailedDelete 14s (x2 over 34s) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5864cc55bb-bmblx_c279e8db-bccd-4a21-b10e-b92f588bd2e8 rpc error: code = Unknown desc = unable to stat dir 36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb:[EFSSG0264C The path /ibm/fs1/36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb does not exist.]

How to Reproduce?

Please list the steps to help development teams reproduce the behavior

  1. Install CSI 2.11.0 with DCUT images

    [root@saurabh29-master ~]# oc get pods
    NAME                                                  READY   STATUS    RESTARTS       AGE
    csi-scale-fsetdemo-pod-2                              1/1     Running   0              19m
    ibm-spectrum-scale-csi-4dkqb                          3/3     Running   0              10h
    ibm-spectrum-scale-csi-attacher-78dc4fc459-b572p      1/1     Running   0              10h
    ibm-spectrum-scale-csi-attacher-78dc4fc459-xlp92      1/1     Running   0              10h
    ibm-spectrum-scale-csi-gdj68                          3/3     Running   0              10h
    ibm-spectrum-scale-csi-operator-879fcc947-wmqbs       1/1     Running   7 (4d7h ago)   5d20h
    ibm-spectrum-scale-csi-provisioner-5864cc55bb-bmblx   1/1     Running   0              10h
    ibm-spectrum-scale-csi-resizer-69446b6bc-8nq6b        1/1     Running   1 (9h ago)     10h
    ibm-spectrum-scale-csi-snapshotter-b844fd99d-vfzl7    1/1     Running   0              10h
    [root@saurabh29-master ~]# oc get cso
    NAME                     VERSION   SUCCESS
    ibm-spectrum-scale-csi   2.11.0    True
    [root@saurabh29-master ~]# oc describe pod | grep quay
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:c110e2e0427f3799dc3a316db4748be5e8ca98bbaf50c7e3d8c7777c91c1375f
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:c110e2e0427f3799dc3a316db4748be5e8ca98bbaf50c7e3d8c7777c91c1375f
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:90c891b61e51be5ab689a595a4cd06919eacc659fc53967014c9e8d0eb4f7629
      CSI_DRIVER_IMAGE:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:c110e2e0427f3799dc3a316db4748be5e8ca98bbaf50c7e3d8c7777c91c1375f
  2. Create PVC as following :

    
    [root@saurabh29-master Upgradetesting]# cat apply.yaml
    apiVersion: v1
    kind: Pod
    metadata:
    name: csi-scale-fsetdemo-pod-2
    labels:
    app: nginx
    spec:
    containers:
    - name: web-server
     image: nginx:1.22.0
     volumeMounts:
       - name: mypvc
         mountPath: /usr/share/nginx/html/scale
     ports:
     - containerPort: 80
    volumes:
    - name: mypvc
     persistentVolumeClaim:
       claimName: scale-advance-pvc-1
       readOnly: false

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: scale-advance-pvc-1 spec: accessModes:


apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ibm-spectrum-scale-csi-advance provisioner: spectrumscale.csi.ibm.com parameters: volBackendFs: "fs1" version: "2" reclaimPolicy: Delete

3. Take snapshot of above pvc:

[root@saurabh29-master Upgradetesting]# cat snapshot.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: ibm-spectrum-scale-snapshot spec: volumeSnapshotClassName: ibm-spectrum-scale-snapshotclass-advance source: persistentVolumeClaimName: scale-advance-pvc-1

apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: ibm-spectrum-scale-snapshotclass-advance driver: spectrumscale.csi.ibm.com parameters: snapWindow: "30" #Optional : Time in minutes (default=30) deletionPolicy: Delete

4. Create shallow copy volume from snapshot :

[root@saurabh29-master Upgradetesting]# cat restore.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ibm-spectrum-scale-pvc-from-snapshot-2 spec: accessModes:

[root@saurabh29-master Upgradetesting]# oc get vs -w NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE ibm-spectrum-scale-snapshot true scale-advance-pvc-1 1Gi ibm-spectrum-scale-snapshotclass-advance snapcontent-564125d8-6d91-42fc-8ee5-dd869cf8eedb 26s 50s ^C[root@saurabh29-master Upgradetesting]# oc get pvc -w NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE ibm-spectrum-scale-pvc-from-snapshot-2 Pending ibm-spectrum-scale-csi-advance 50s scale-advance-pvc-1 Bound pvc-280bbb6d-f725-4691-adbb-5a768a66705f 1Gi RWX ibm-spectrum-scale-csi-advance 9m14s ibm-spectrum-scale-pvc-from-snapshot-2 Pending pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 0 ibm-spectrum-scale-csi-advance 90s ibm-spectrum-scale-pvc-from-snapshot-2 Bound pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 1Gi ROX ibm-spectrum-scale-csi-advance 90s

5. Delete the shallow copy volume and snapshot together: 

[root@saurabh29-master Upgradetesting]# cat del.sh oc delete pvc ibm-spectrum-scale-pvc-from-snapshot-2 oc delete vs ibm-spectrum-scale-snapshot --force [root@saurabh29-master Upgradetesting]# bash del.sh persistentvolumeclaim "ibm-spectrum-scale-pvc-from-snapshot-2" deleted Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. volumesnapshot.snapshot.storage.k8s.io "ibm-spectrum-scale-snapshot" force deleted

6. check the pv description 

[root@saurabh29-master ~]# oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE pvc-280bbb6d-f725-4691-adbb-5a768a66705f 1Gi RWX Delete Bound ibm-spectrum-scale-csi-driver/scale-advance-pvc-1 ibm-spectrum-scale-csi-advance 23m pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 1Gi ROX Delete Released ibm-spectrum-scale-csi-driver/ibm-spectrum-scale-pvc-from-snapshot-2 ibm-spectrum-scale-csi-advance 14m [root@saurabh29-master ~]# oc describe pv pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 Name: pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 Labels: Annotations: pv.kubernetes.io/provisioned-by: spectrumscale.csi.ibm.com volume.kubernetes.io/provisioner-deletion-secret-name: volume.kubernetes.io/provisioner-deletion-secret-namespace: Finalizers: [kubernetes.io/pv-protection] StorageClass: ibm-spectrum-scale-csi-advance Status: Released Claim: ibm-spectrum-scale-csi-driver/ibm-spectrum-scale-pvc-from-snapshot-2 Reclaim Policy: Delete Access Modes: ROX VolumeMode: Filesystem Capacity: 1Gi Node Affinity: Message: Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: spectrumscale.csi.ibm.com FSType: gpfs VolumeHandle: 1;3;14016324136648177722;BB4A0B0A:65A5F92B;36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver;pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9;/ibm/fs1/36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/.snapshots/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb/pvc-280bbb6d-f725-4691-adbb-5a768a66705f ReadOnly: false VolumeAttributes: csi.storage.k8s.io/pv/name=pvc-bc4e2535-517e-44c9-a2ee-f8bdf36755d9 csi.storage.k8s.io/pvc/name=ibm-spectrum-scale-pvc-from-snapshot-2 csi.storage.k8s.io/pvc/namespace=ibm-spectrum-scale-csi-driver storage.kubernetes.io/csiProvisionerIdentity=1709092347497-7554-spectrumscale.csi.ibm.com version=2 volBackendFs=fs1 Events: Type Reason Age From Message


Warning VolumeFailedDelete 52s spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5864cc55bb-bmblx_c279e8db-bccd-4a21-b10e-b92f588bd2e8 rpc error: code = Internal desc = unable to Delete shallow copy reference parent dir using FS [fs1] Error [unable to delete dir 36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb:[EFSSG0264C The path /ibm/fs1/36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb does not exist.]] Warning VolumeFailedDelete 14s (x2 over 34s) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5864cc55bb-bmblx_c279e8db-bccd-4a21-b10e-b92f588bd2e8 rpc error: code = Unknown desc = unable to stat dir 36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb:[EFSSG0264C The path /ibm/fs1/36c17a30-3ab4-4625-9816-3af4b6a92b58-ibm-spectrum-scale-csi-driver/snapshot-564125d8-6d91-42fc-8ee5-dd869cf8eedb does not exist.]

## Expected behavior
Shallow copy pv should be deleted if snapshot is deleted earlier 

### Data Collection and Debugging

/scale-csi/D.1105 csisnap.tar.gz

saurabhwani5 commented 7 months ago

closing as verified