IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

PV was not deleted but fileset was deleted in error case if owning cluster is unhealth #296

Open gandhisanjayv opened 4 years ago

gandhisanjayv commented 4 years ago

Describe the bug Deleted two PVCS's while owning cluster was unhealthy. FS was unmounted due cluster was not in quorum. problem was fixed after few minutes. It deleted PVC's, it got deleted but PV's were not deleted, i see filesets are deleted.

state before delete

NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     AGE
pvc-gpfstest-replica   Bound    pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49   100Gi      RWX            ibm-spectrum-scale-csi-fileset   4d18h
pvc-stress-ng          Bound    pvc-0f4d734e-dda9-49d6-8706-1d7947c87426   100Gi      RWX            ibm-spectrum-scale-csi-fileset   4d17h

mmlsfileset fs1
Filesets in file system 'fs1':
Name                     Status    Path
root                     Linked    /var/gpfs/fs1
cnss-demo-fset1          Linked    /var/gpfs/fs1/cnss-demo-fset1
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284 Linked /var/gpfs/fs1/pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284
pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1 Linked /var/gpfs/fs1/pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1
pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49 Linked /var/gpfs/fs1/pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49
pvc-0f4d734e-dda9-49d6-8706-1d7947c87426 Linked /var/gpfs/fs1/pvc-0f4d734e-dda9-49d6-8706-1d7947c87426

State after delete

 oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                STORAGECLASS                     REASON   AGE
pvc-0f4d734e-dda9-49d6-8706-1d7947c87426   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/pvc-stress-ng          ibm-spectrum-scale-csi-fileset            4d18h
pvc-a20008bc-8fa3-4095-9a27-3bde060c5d49   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/pvc-gpfstest-replica   ibm-spectrum-scale-csi-fileset            4d19h
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284   100Gi      RWX            Delete           Released   ibm-spectrum-scale-csi-driver/scale-fset-pvc         ibm-spectrum-scale-csi-fileset            7d19h
registry-storage                           200Gi      RWX            Recycle          Bound      openshift-image-registry/image-registry-storage

mmlsfileset fs1
Filesets in file system 'fs1':
Name                     Status    Path
root                     Linked    /var/gpfs/fs1
cnss-demo-fset1          Linked    /var/gpfs/fs1/cnss-demo-fset1
pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284 Linked /var/gpfs/fs1/pvc-c498f87f-f3b7-4b9f-a559-0e07c9427284
pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1 Linked /var/gpfs/fs1/pvc-b75d1da6-a2e8-4e11-9006-a6e3a9babfe1

To Reproduce Steps to reproduce the behavior:

  1. create dynamic fileset PVCs on a remote cluster
  2. inject error-> keep GUI node up in remote cluster but shutdown majority quorum nodes so that FS is unmounted.
  3. delete pvcs
  4. fix remote cluster issue after few minutes -> start all quorum nodes

Expected behavior PV's should be deleted if fileset is deleted.

Environment Please run the following an paste your output here:

 oc version
Client Version: 4.5.4
Server Version: 4.5.4
Kubernetes Version: v1.18.3+012b3ec

GPFS version of remote cluster
 mmdiag
Current GPFS build: "5.0.5.2 ".
Built on Aug  3 2020 at 21:11:03
Running 1 day 19 hours 13 minutes 35 secs, pid 3367

Container ID:  cri-o://59c926379040b71f6aec5ef5ee9316bbb79d361a499f263ca65e45618b0bc161
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:dev

  csi-provisioner:
    Container ID:  cri-o://d2bcebce1dfb75cead653512cc3d7a4a452d17c1b4aadadd57455334d0449ff3
    Image:         quay.io/k8scsi/csi-provisioner:v1.5.0
    Image ID:      quay.io/k8scsi/csi-provisioner@sha256:e10aab64506dd46

# Deployment

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

gandhisanjayv commented 4 years ago

debug data is in /u/DUMPS/git-csi-issue296/

deeghuge commented 3 years ago

@gandhisanjayv does the logs gets deleted after sometime ? The /u/DUMPS/git-csi-issue296/ does not seem to exist on glogin10.

Jainbrt commented 2 years ago

@gandhisanjayv is this issue still reproducible ?

Jainbrt commented 2 years ago

@gandhisanjayv could you please add Customer Impact & Customer probability labels to the issue?