IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

new restored pvcs remain in Pending state if driver pods are restarted during restore snapshot operation (for 1 million files snapshot) #389

Closed kulkarnicr closed 8 months ago

kulkarnicr commented 3 years ago

Describe the bug restarting driver pods during 1 million file restore snapshot operation keeps new pvcs in Pending state.

  1. create pvc, add 1 million files to it, create a snapshot of this pvc i.e. snapshot has 1 million files
  2. restore 1 million file snapshot to two new pvcs simultaneously
  3. when restore snapshot operation is in progress (i.e. mmapplypolicy/mmxcp yet to finish), restart the driver pods - ensure the pods are up after restart
  4. verify 1 million files get copied even after driver pod restart and both new restored pvcs go to Bound state. Observed that:
    • all 1 million files copied to both new pvcs (in =~ 4 hours)
    • however both new pvcs remained in Pending state (even after 13 hours)

I kept debug data for this issue on test cluster system. Please feel free to access the test system for live debugging. t-x-master.fyre.ibm.com:~/pvc-pending-when-driver-pods-restarted/

To Reproduce Steps to reproduce the behavior:

  1. create pvc, add 1 million files to it, create a snapshot of this pvc i.e. snapshot has 1 million files

    [root@t-x-master 2021_03_11-04:03:21 ~]$ knvs vs1-fs2-million
    NAME              READYTOUSE   SOURCEPVC          SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
    vs1-fs2-million   true         pvc1-fs2-million                           20Gi          vsclass1        snapcontent-ac78da22-4977-4946-ba8c-98547f82cc0b   36d            36d
    [root@t-x-master 2021_03_11-04:03:21 ~]$
  2. restore 1 million file snapshot to two new pvcs simultaneously

    [root@t-x-master 2021_03_11-04:03:25 test_snapshot]$ cat fix-restore2-vs1-fs2-million-to-fs3.yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: fix-restore2-vs1-fs2-million-to-fs3
    spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
         storage: 20Gi
    storageClassName: sc4-fs3-million
    #storageClassName: sc2-fs3-million
    #storageClassName: sc3-fs3-snapcp
    dataSource:
      name: vs1-fs2-million
      kind: VolumeSnapshot
      apiGroup: snapshot.storage.k8s.io
    [root@t-x-master 2021_03_11-04:03:31 test_snapshot]$
    [root@t-x-master 2021_03_11-04:03:32 test_snapshot]$ cat fix-restore3-vs1-fs2-million-to-fs3.yaml
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: fix-restore3-vs1-fs2-million-to-fs3
    spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
         storage: 20Gi
    storageClassName: sc4-fs3-million
    #storageClassName: sc3-fs3-million
    #storageClassName: sc3-fs3-snapcp
    dataSource:
      name: vs1-fs2-million
      kind: VolumeSnapshot
      apiGroup: snapshot.storage.k8s.io
    [root@t-x-master 2021_03_11-04:03:40 test_snapshot]$
    [root@t-x-master 2021_03_11-04:04:25 test_snapshot]$ mmxcp list all
    [I] There are no parallel copy commands currently active in the cluster.
    [E] The active parallel copy command with ID 'all' could not be found.
    mmxcp: Command failed. Examine previous error messages to determine cause.
    [root@t-x-master 2021_03_11-04:04:29 test_snapshot]$
    [root@t-x-master 2021_03_11-04:04:31 test_snapshot]$ kn apply -f fix-restore2-vs1-fs2-million-to-fs3.yaml
    persistentvolumeclaim/fix-restore2-vs1-fs2-million-to-fs3 created
    [root@t-x-master 2021_03_11-04:04:47 test_snapshot]$ knpvc fix-restore2-vs1-fs2-million-to-fs3
    NAME                                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
    fix-restore2-vs1-fs2-million-to-fs3   Pending                                      sc4-fs3-million   69s
    [root@t-x-master 2021_03_11-04:05:55 test_snapshot]$
    [root@t-x-master 2021_03_11-04:06:47 test_snapshot]$ mmxcp list all
    PARALLEL_COPY_ID:XCP1615464324
    PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data
    PARALLEL_COPY_TARGET_PATH:/mnt/fs3/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4-data
    PARALLEL_COPY_SOURCE_DEVICE:fs2
    PARALLEL_COPY_TARGET_DEVICE:fs3
    PARALLEL_COPY_NODE_LIST:10.11.82.111
    PARALLEL_COPY_START_TIME:Thu Mar 11 04-05-24 2021
    [root@t-x-master 2021_03_11-04:06:55 test_snapshot]$
    [root@t-x-master 2021_03_11-04:06:56 test_snapshot]$ kn apply -f fix-restore3-vs1-fs2-million-to-fs3.yaml
    persistentvolumeclaim/fix-restore3-vs1-fs2-million-to-fs3 created
    [root@t-x-master 2021_03_11-04:07:10 test_snapshot]$ knpvc fix-restore3-vs1-fs2-million-to-fs3
    NAME                                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
    fix-restore3-vs1-fs2-million-to-fs3   Pending                                      sc4-fs3-million   8s
    [root@t-x-master 2021_03_11-04:08:27 test_snapshot]$ mmxcp list all
    PARALLEL_COPY_ID:XCP1615464324
    PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data
    PARALLEL_COPY_TARGET_PATH:/mnt/fs3/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4-data
    PARALLEL_COPY_SOURCE_DEVICE:fs2
    PARALLEL_COPY_TARGET_DEVICE:fs3
    PARALLEL_COPY_NODE_LIST:10.11.82.111
    PARALLEL_COPY_START_TIME:Thu Mar 11 04-05-24 2021
    PARALLEL_COPY_ID:XCP1615464496
    PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data
    PARALLEL_COPY_TARGET_PATH:/mnt/fs3/pvc-631c2241-230b-487a-94a6-336bbd5d4103/pvc-631c2241-230b-487a-94a6-336bbd5d4103-data
    PARALLEL_COPY_SOURCE_DEVICE:fs2
    PARALLEL_COPY_TARGET_DEVICE:fs3
    PARALLEL_COPY_NODE_LIST:10.11.82.111
    PARALLEL_COPY_START_TIME:Thu Mar 11 04-08-16 2021
  3. when restore snapshot operation is in progress (i.e. mmapplypolicy/mmxcp yet to finish), restart the driver pods - ensure the pods are up after restart

    
    [root@t-x-master 2021_03_11-04:09:13 test_snapshot]$ kn get pods -o wide
    NAME                                              READY   STATUS    RESTARTS   AGE     IP             NODE                       NOMINATED NODE   READINESS GATES
    csi-scale-staticdemo-pod                          1/1     Running   4          27d     10.244.2.46    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-9rbjg                      2/2     Running   0          7h25m   10.11.82.234   t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-attacher-0                 1/1     Running   15         15d     10.244.2.35    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-bhzbr                      2/2     Running   0          7h25m   10.11.82.117   t-x-worker1.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-operator-fdbfc4665-lwvgs   1/1     Running   0          8d      10.244.2.45    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-provisioner-0              1/1     Running   15         15d     10.244.1.22    t-x-worker1.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-snapshotter-0              1/1     Running   15         15d     10.244.2.36    t-x-worker2.fyre.ibm.com   <none>           <none>
    [root@t-x-master 2021_03_11-04:10:09 test_snapshot]$ kn delete pod ibm-spectrum-scale-csi-9rbjg
    pod "ibm-spectrum-scale-csi-9rbjg" deleted
    [root@t-x-master 2021_03_11-04:10:35 test_snapshot]$ kn delete pod ibm-spectrum-scale-csi-bhzbr
    pod "ibm-spectrum-scale-csi-bhzbr" deleted
    [root@t-x-master 2021_03_11-04:11:45 test_snapshot]$ kn get pods -o wide
    NAME                                              READY   STATUS    RESTARTS   AGE   IP             NODE                       NOMINATED NODE   READINESS GATES
    csi-scale-staticdemo-pod                          1/1     Running   4          27d   10.244.2.46    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-4kdbz                      2/2     Running   0          73s   10.11.82.234   t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-attacher-0                 1/1     Running   16         15d   10.244.2.35    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-c8twb                      2/2     Running   0          62s   10.11.82.117   t-x-worker1.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-operator-fdbfc4665-lwvgs   1/1     Running   0          8d    10.244.2.45    t-x-worker2.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-provisioner-0              1/1     Running   16         15d   10.244.1.22    t-x-worker1.fyre.ibm.com   <none>           <none>
    ibm-spectrum-scale-csi-snapshotter-0              1/1     Running   16         15d   10.244.2.36    t-x-worker2.fyre.ibm.com   <none>           <none>

4. verify 1 million files get copied after driver pod restart and both new restored pvcs go to Bound state.
Observed that:
- Using a script, I kept tracking number of files copied to new restored pvcs and observed that all 1 million files copied to both new pvcs (in =~ 4 hours). 

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore2-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 3h46m Thu Mar 11 07:51:40 PST 2021 PVC Filecount 1048576 1048576 7277504

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore3-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 3h44m Thu Mar 11 07:51:40 PST 2021 PVC Filecount 1048576 1048576 7277504


- however both new pvcs remained in Pending state (even after 13 hours)

[root@t-x-master 2021_03_11-17:52:45 ~]$ knpvc fix-restore2-vs1-fs2-million-to-fs3 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore2-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 13h [root@t-x-master 2021_03_11-17:52:50 ~]$ [root@t-x-master 2021_03_11-17:52:51 ~]$ knpvc fix-restore3-vs1-fs2-million-to-fs3 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore3-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 13h


**Expected behavior**
verify 1 million files get copied after driver pod restart and both new restored pvcs go to Bound state.

**Environment**
Please run the following an paste your output here:
``` bash
# Developement
operator-sdk version 
go version

# Deployment
kubectl version
rpm -qa | grep gpfs
[root@t-x-master 2021_03_11-17:53:10 ~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
[root@t-x-master 2021_03_11-18:01:10 ~]$ rpm -qa | grep gpfs
gpfs.base-5.1.1-0.210201.112049.x86_64
gpfs.license.dm-5.1.1-0.210201.112049.x86_64
gpfs.gss.pmcollector-5.1.1-0.el7.x86_64
gpfs.gskit-8.0.55-19.x86_64
gpfs.msg.en_US-5.1.1-0.210201.112049.noarch
gpfs.gpl-5.1.1-0.210201.112049.noarch
gpfs.adv-5.1.1-0.210201.112049.x86_64
gpfs.crypto-5.1.1-0.210201.112049.x86_64
gpfs.gss.pmsensors-5.1.1-0.el7.x86_64
gpfs.java-5.1.1-0.210201.112049.x86_64
gpfs.gui-5.1.1-0.210201.114540.noarch
gpfs.docs-5.1.1-0.210201.112049.noarch
gpfs.compression-5.1.1-0.210201.112049.x86_64
[root@t-x-master 2021_03_11-18:01:12 ~]$

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Please note that restoring 1 million file snapshot to two pvcs simultaneously worked fine (all files copied + pvcs went to Bound state) if driver pods were not restarted. It is verified in issue #366. When driver pods are restarted, restored pvcs don't go to Bound i.e. usable state as mentioned in this issue.

kulkarnicr commented 3 years ago

I also observed that for 1 million file restore snapshot operation, mmapplypolicy command is executed with only one node (-N option of mmapplypolicy was run with only one node).

My understanding was that if filesystem is mounted on multiple nodes then we'll utilize mmxcp's parallel copy behavior (by giving multiple nodes to -N option).

Mar 11 04:05:07 t-x-master mmfs[23166]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs3 pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1600000:1600000 --allow-permission-change chmodAndSetAcl' RC=0
Mar 11 04:05:08 t-x-master systemd: Started Session c129270 of user root.
Mar 11 04:05:09 t-x-master systemd: Started Session c129271 of user root.
Mar 11 04:05:09 t-x-master mmfs[23391]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs3 pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 -J /mnt/fs3/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4' RC=0
Mar 11 04:05:10 t-x-master systemd: Started Session c129272 of user root.
Mar 11 04:05:10 t-x-master systemd: Started Session c129273 of user root.
Mar 11 04:05:12 t-x-master systemd: Started Session c129274 of user root.
Mar 11 04:05:12 t-x-master systemd: Started Session c129275 of user root.
Mar 11 04:05:13 t-x-master systemd: Started Session c129276 of user root.
Mar 11 04:05:13 t-x-master mmfs[23756]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs3:pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 --block 21474836480:21474836480' RC=0
Mar 11 04:05:24 t-x-master mmfs[25262]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.24664/tmpPolicyFile -N 10.11.82.111 --scope=inodespace'
...
Mar 11 04:07:32 t-x-master systemd: Started Session c129315 of user root.
Mar 11 04:07:40 t-x-master mmfs[11822]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs3 pvc-631c2241-230b-487a-94a6-336bbd5d4103 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1600000:1600000 --allow-permission-change chmodAndSetAcl' RC=0
Mar 11 04:07:42 t-x-master systemd: Started Session c129316 of user root.
Mar 11 04:07:44 t-x-master systemd: Started Session c129317 of user root.
Mar 11 04:07:45 t-x-master mmfs[14421]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs3 pvc-631c2241-230b-487a-94a6-336bbd5d4103 -J /mnt/fs3/pvc-631c2241-230b-487a-94a6-336bbd5d4103' RC=0
Mar 11 04:07:46 t-x-master systemd: Started Session c129318 of user root.
Mar 11 04:07:48 t-x-master systemd: Started Session c129319 of user root.
Mar 11 04:07:49 t-x-master systemd: Started Session c129320 of user root.
Mar 11 04:07:49 t-x-master systemd: Started Session c129321 of user root.
Mar 11 04:07:50 t-x-master systemd: Started Session c129322 of user root.
Mar 11 04:07:51 t-x-master mmfs[17320]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs3:pvc-631c2241-230b-487a-94a6-336bbd5d4103 --block 21474836480:21474836480' RC=0
Mar 11 04:08:18 t-x-master mmfs[31533]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.22427/tmpPolicyFile -N 10.11.82.111 --scope=inodespace'

filesystem (fs3) was mounted on all nodes

[root@t-x-master 2021_03_11-18:25:17 tmp]$ mmlsmount fs3
File system fs3 is mounted on 3 nodes.
[root@t-x-master 2021_03_11-18:25:26 tmp]$
[root@t-x-master 2021_03_11-18:25:27 tmp]$ mmdsh -N all " df -h | grep fs3 "
t-x-worker2.fyre.ibm.com:  fs3                    100G   57G   44G  57% /mnt/fs3
t-x-worker1.fyre.ibm.com:  fs3                    100G   57G   44G  57% /mnt/fs3
t-x-master.fyre.ibm.com:  fs3                                   100G   57G   44G  57% /mnt/fs3
[root@t-x-master 2021_03_11-18:25:36 tmp]$
kulkarnicr commented 3 years ago

verified the issue with following builds and it is not reproducible.

================================================
Tue 16 Mar 2021 03:01:09 AM PDT PVC State
NAME                                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
fix-restore2-vs1-fs2-million-to-fs3   Pending                                      sc4-fs3-million   6h
Tue 16 Mar 2021 03:01:09 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 03:23:48 AM PDT PVC State
NAME                                  STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
fix-restore3-vs1-fs2-million-to-fs3   Pending                                      sc4-fs3-million   6h18m
Tue 16 Mar 2021 03:23:48 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 05:43:39 AM PDT PVC State
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
fix-restore2-vs1-fs2-million-to-fs3   Bound    pvc-f6728087-576c-4830-9856-e32b8ef68408   20Gi       RWX            sc4-fs3-million   8h
Tue 16 Mar 2021 05:43:39 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 06:00:03 AM PDT PVC State
NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
fix-restore3-vs1-fs2-million-to-fs3   Bound    pvc-1abbc71d-72c0-4d7f-ab39-03ffea557e72   20Gi       RWX            sc4-fs3-million   8h
Tue 16 Mar 2021 06:00:03 AM PDT PVC Filecount
1048576 1048576 7277504
kulkarnicr commented 3 years ago

Tried out an extra test. If driver pod remains down till mmapplypolicy/mmxcp finishes and later driver pod comes up, then I observed that another new instance of mmapplypolicy is triggered for same data.

Mar 17 23:05:53 swiftest4 mmfs[3194055]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3193668/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' Mar 17 23:06:13 swiftest4 systemd[1]: Started Session 7041 of user root. Mar 17 23:06:13 swiftest4 systemd[1]: session-7041.scope: Succeeded. Mar 17 23:06:18 swiftest4 systemd[1]: Started Session 7042 of user root. Mar 17 23:06:19 swiftest4 systemd[1]: session-7042.scope: Succeeded. Mar 17 23:06:56 swiftest4 systemd[1]: Started Session 7043 of user root. Mar 17 23:06:56 swiftest4 systemd[1]: session-7043.scope: Succeeded. Mar 17 23:10:09 swiftest4 mmfs[3195644]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3193668/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' RC=0

[root@swiftest1 2021_03_17-23:10:34 test_snapshot]$ kubectl label node swiftest3.fyre.ibm.com scale=true --overwrite=true node/swiftest3.fyre.ibm.com labeled

[root@swiftest1 2021_03_17-23:11:17 test_snapshot]$ kn get pods NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-csi-66bvv 2/2 Running 0 2d1h ibm-spectrum-scale-csi-attacher-0 1/1 Running 3 6d22h ibm-spectrum-scale-csi-b6l6q 2/2 Running 0 3s ibm-spectrum-scale-csi-operator-8cb5f8c47-vp6n5 1/1 Running 0 2d18h ibm-spectrum-scale-csi-provisioner-0 1/1 Running 4 6d22h ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 3 6d22hMar 17 23:11:24 swiftest4 mmfs[3196135]: REST-

CLI root admin [EXIT, CHANGE] 'mmsetquota fs1:pvc-5edda101-edaf-49e8-aa9c-e9e5a140ccd7 --block 26843545600:26843545600' RC=0 Mar 17 23:11:28 swiftest4 mmfs[3196724]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3196193/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' Mar 17 23:13:11 swiftest4 systemd[1]: Started Session 7044 of user root. Mar 17 23:13:11 swiftest4 systemd[1]: session-7044.scope: Succeeded. Mar 17 23:14:26 swiftest4 systemd[1]: Started Session 7045 of user root. Mar 17 23:14:26 swiftest4 systemd[1]: session-7045.scope: Succeeded. Mar 17 23:15:01 swiftest4 CRON[3198714]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Mar 17 23:15:22 swiftest4 mmfs[3198828]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3196193/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' RC=0

saurabhwani5 commented 8 months ago

verified with CSI 2.11.0 and issue is not getting recreated

hemalathagajendran commented 8 months ago

@saurabhwani5 Please add logs and scenarios tested.