IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

PVCs are not getting restored when there are 100 snapshots #1017

Closed saurabhwani5 closed 1 year ago

saurabhwani5 commented 1 year ago

Describe the bug

Restored PVCs are not coming in bound state when source is 100 Snapshots. Source PVC are 100 on which 100 snapshots (1 snapshot per PVC) are taken is of version 1 Independent having data 400+ MB which is written using ioMixer. From these snapshots PVCs are restored which are failing in this case.

How to Reproduce?

  1. Install CSI 2.10.0 - (OCP -local snc)

    [root@local-snc ~]# oc get pods
    NAME                                                  READY   STATUS    RESTARTS       AGE
    ibm-spectrum-scale-csi-66l59                          3/3     Running   0              4d5h
    ibm-spectrum-scale-csi-9bt5t                          3/3     Running   0              4d5h
    ibm-spectrum-scale-csi-attacher-6cbfcb9b6-6qs5r       1/1     Running   0              4d11h
    ibm-spectrum-scale-csi-attacher-6cbfcb9b6-tljxj       1/1     Running   3 (43h ago)    4d11h
    ibm-spectrum-scale-csi-provisioner-5fc97ff9df-hcbph   1/1     Running   0              9h
    ibm-spectrum-scale-csi-resizer-f4dd6f596-dhrx8        1/1     Running   1 (4d5h ago)   4d11h
    ibm-spectrum-scale-csi-snapshotter-799cf54b85-6mmxt   1/1     Running   1 (4d5h ago)   4d11h
    ibm-spectrum-scale-csi-w2qlz                          3/3     Running   0              4d5h
    [root@local-snc ~]# oc get cso
    NAME                     VERSION   SUCCESS
    ibm-spectrum-scale-csi   2.10.0    True
    [root@local-snc ~]# oc describe pod | grep quay
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-080923
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:353f4ad71c3fa62d2f283d66341a78996c20d139d87daa64a6871fdabf93207d
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-080923
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:353f4ad71c3fa62d2f283d66341a78996c20d139d87daa64a6871fdabf93207d
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-080923
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:353f4ad71c3fa62d2f283d66341a78996c20d139d87daa64a6871fdabf93207d
    [root@local-snc ~]#
  2. Set mmxcp threads to 100:

    [root@worker0 /]# mmxcp config --get-max-value
    [I] Current maximum number of parallel copy and sync commands in this cluster: 10
    [root@worker0 /]#  mmxcp config --set-max-value 100
    [I] Successfully set maximum number of parallel copy and sync commands in this cluster: 100
    [root@worker0 /]# mmxcp config --get-max-value
    [I] Current maximum number of parallel copy and sync commands in this cluster: 100
  3. Change the provisioner to 100 worker threads:

    Args:
      --csi-address=$(ADDRESS)
      --timeout=3m
      --worker-threads=100
      --extra-create-metadata
      --v=5
      --default-fstype=gpfs
      --leader-election=true
      --leader-election-lease-duration=$(LEADER_ELECTION_LEASE_DURATION)
      --leader-election-renew-deadline=$(LEADER_ELECTION_RENEW_DEADLINE)
      --leader-election-retry-period=$(LEADER_ELECTION_RETRY_PERIOD)

    Set operator replica to 0 before making this change

  4. Create 100 - version 1 Independent PVCs:

    [root@local-snc ~]# oc get pvc | grep pvc -c
    100
  5. Write data using iotools of 400+ MB in all 100 PVCs:

    [root@io-test-source-pod-91 data1]# ls
    ioMixer
    [root@io-test-source-pod-91 data1]# du -sh *
    432M    ioMixer
  6. Take snapshot of each PVC:

    
    [root@local-snc snap]# cat vsc.yaml
    apiVersion: snapshot.storage.k8s.io/v1
    kind: VolumeSnapshotClass
    metadata:
    name: ibm-spectrum-scale-snapshotclass
    driver: spectrumscale.csi.ibm.com
    deletionPolicy: Delete
    [root@local-snc snap]# oc apply -f vsc.yaml
    volumesnapshotclass.snapshot.storage.k8s.io/ibm-spectrum-scale-snapshotclass created

[root@local-snc Independent]# cat snap/snap.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: ibm-spectrum-scale-snapshot namespace: default spec: volumeSnapshotClassName: ibm-spectrum-scale-snapshotclass source: persistentVolumeClaimName: scale-fset-independent-pvc- [root@local-snc ~]# oc get vs | grep snapshot -c 100

7. Try to restore this snapshot :
After trying to restore these PVCs 51 are in pending state [root@local-snc ~]# oc get pvc grep snapshot grep Pending -c 51 [root@local-snc ~]# oc describe pvc ibm-spectrum-scale-pvc-from-snapshot-8 -n default Name: ibm-spectrum-scale-pvc-from-snapshot-8 Namespace: default StorageClass: ibm-spectrum-scale-csi-fileset-independent Status: Pending Volume: Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com volume.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: ibm-spectrum-scale-snapshot-8 Used By: Events: Type Reason Age From Message

Warning ProvisioningFailed 24m spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5fc97ff9df-hcbph_4cd251f1-d538-4c35-9a30-d643f80d2ccb failed to provision volume with StorageClass "ibm-spectrum-scale-csi-fileset-independent": rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning ProvisioningFailed 22m (x5 over 24m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5fc97ff9df-hcbph_4cd251f1-d538-4c35-9a30-d643f80d2ccb failed to provision volume with StorageClass "ibm-spectrum-scale-csi-fileset-independent": rpc error: code = Aborted desc = volume creation already in process : pvc-f1e2ff6d-d14b-4995-a2e2-e12b7ca9c092 Normal ExternalProvisioning 2m38s (x105 over 27m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Normal Provisioning 84s (x14 over 27m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5fc97ff9df-hcbph_4cd251f1-d538-4c35-9a30-d643f80d2ccb External provisioner is provisioning volume for claim "default/ibm-spectrum-scale-pvc-from-snapshot-8" Warning ProvisioningFailed 84s (x8 over 21m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-5fc97ff9df-hcbph_4cd251f1-d538-4c35-9a30-d643f80d2ccb failed to provision volume with StorageClass "ibm-spectrum-scale-csi-fileset-independent": rpc error: code = Internal desc = snapshot copy job had failed for snapshot: snapshot-7a8afa1c-a0ff-43af-83b0-951c1bca958a


Observation : There are 51/100 are in pending state. 
Error Logs:
`[[EFSSG0264C The path /mnt/local-sample/pvc-e507babf-cbb7-4c44-8263-1882108ae71d/.snapshots/snapshot-363933cf-4000-47be-b385-9da40017378a/pvc-e507babf-cbb7-4c44-8263-1882108ae71d-data does not exist.]]`

## Expected behavior
All PVC should be restored properly and CSI should retry in case of failure

Must Gather Uploaded in` scale-csi/D.1017`
amdabhad commented 1 year ago

Closing this as this is a GUI issue and a github issue is created for the same - https://github.ibm.com/IBMSpectrumScale/scale-core/issues/6033