IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

pvcs remain in pending state for restore snapshot operation, if we reboot a node where mmxcp is running. #402

Closed kulkarnicr closed 8 months ago

kulkarnicr commented 3 years ago

Describe the bug While restoring a snapshot (having 10k files) to 2 pvcs, I did reboot a node where mmxcp was running. After reboot, new mmxcp process was not triggered. CCR file _xcpRunning still showed entry for mmxcp that was triggered before reboot. New PVC remained in Pending state.

To Reproduce Steps to reproduce the behavior:

Note - Please read the abbreviation kn as "kubectl -n ibm-spectrum-scale-csi-driver " in following commands.

  1. Restore a snapshot to new pvc
    
    [root@r-x-master 2021_04_01-02:19:16 restore-10k-files]$ knvs vs1-fs2-tenk
    NAME           READYTOUSE   SOURCEPVC       SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
    vs1-fs2-tenk   true         pvc1-fs2-tenk                           5Gi           vsclass1        snapcontent-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79   10d            10d

[root@r-x-master 2021_04_01-02:41:19 restore-10k-files]$ cat res31-tenk.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: res31-tenk spec: accessModes:

Apr 1 02:20:19 r-x-master mmfs: [N] Expanding fs2 inode space 11 current 0 inodes (0 free) by 20480 Apr 1 02:20:20 r-x-master mmfs: [N] Expanded fs2 inode space 11 from 0 to 20480 inodes (on-demand). Apr 1 02:20:20 r-x-master mmfs[26681]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs2 pvc-467ecd6c-2fe1-461e-be54-a601e9d48803 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 20000:20000 --allow-permission-change chmodAndSetAcl' RC=0 Apr 1 02:20:21 r-x-master systemd: Started Session c32460 of user root. Apr 1 02:20:22 r-x-master systemd: Started Session c32461 of user root. Apr 1 02:20:22 r-x-master mmfs[26890]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs2 pvc-467ecd6c-2fe1-461e-be54-a601e9d48803 -J /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803' RC=0 Apr 1 02:20:24 r-x-master mmfs[27187]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs2:pvc-467ecd6c-2fe1-461e-be54-a601e9d48803 --block 5368709120:5368709120' RC=0 Apr 1 02:20:34 r-x-master mmfs[28665]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.28202/tmpPolicyFile -N 10.11.113.209 -m 1 --scope=inodespace' Apr 1 02:20:37 r-x-master mmfs: [N] Expanding fs2 inode space 12 current 0 inodes (0 free) by 20480 Apr 1 02:20:40 r-x-master mmfs: [N] Expanded fs2 inode space 12 from 0 to 20480 inodes (on-demand). Apr 1 02:20:40 r-x-master mmfs[29101]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs2 pvc-239078f2-b0fc-4e59-9d16-4a8767e54255 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 20000:20000 --allow-permission-change chmodAndSetAcl' RC=0 Apr 1 02:20:42 r-x-master mmfs[29491]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs2 pvc-239078f2-b0fc-4e59-9d16-4a8767e54255 -J /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255' RC=0 Apr 1 02:20:44 r-x-master mmfs[29944]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs2:pvc-239078f2-b0fc-4e59-9d16-4a8767e54255 --block 5368709120:5368709120' RC=0 Apr 1 02:20:53 r-x-master mmfs[32030]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.31321/tmpPolicyFile -N 10.11.113.209 -m 1 --scope=inodespace'

[root@r-x-master 2021_04_01-02:20:46 restore-10k-files]$ mmxcp list all PARALLEL_COPY_ID:XCP1616732035 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-4141dba5-e9e1-4c27-97bd-d941bc67f147/pvc-4141dba5-e9e1-4c27-97bd-d941bc67f147-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 21-13-55 2021 PARALLEL_COPY_ID:XCP1616732112 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-80a299e6-85bb-44e2-8d50-27147a35d289/pvc-80a299e6-85bb-44e2-8d50-27147a35d289-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 21-15-12 2021 PARALLEL_COPY_ID:XCP1616739442 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/dir4 PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 23-17-22 2021 PARALLEL_COPY_ID:XCP1617268833 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Apr 01 02-20-33 2021 PARALLEL_COPY_ID:XCP1617268852 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Apr 01 02-20-52 2021 [root@r-x-master 2021_04_01-02:20:59 restore-10k-files]$ [root@r-x-master 2021_04_01-02:21:00 restore-10k-files]$ mmccr fget _xcpRunning /tmp/1 fget:499 [root@r-x-master 2021_04_01-02:21:04 restore-10k-files]$ cat /tmp/1 XCP1616732035;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D4141dba5%2De9e1%2D4c27%2D97bd%2Dd941bc67f147%2Fpvc%2D4141dba5%2De9e1%2D4c27%2D97bd%2Dd941bc67f147%2Ddata;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2021%2D13%2D55%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1616732112;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D80a299e6%2D85bb%2D44e2%2D8d50%2D27147a35d289%2Fpvc%2D80a299e6%2D85bb%2D44e2%2D8d50%2D27147a35d289%2Ddata;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2021%2D15%2D12%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1616739442;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fdir4;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2023%2D17%2D22%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1617268833;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D467ecd6c%2D2fe1%2D461e%2Dbe54%2Da601e9d48803%2Fpvc%2D467ecd6c%2D2fe1%2D461e%2Dbe54%2Da601e9d48803%2Ddata;fs2;fs2;10.11.113.209;Thu%20Apr%2001%2002%2D20%2D33%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1617268852;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D239078f2%2Db0fc%2D4e59%2D9d16%2D4a8767e54255%2Fpvc%2D239078f2%2Db0fc%2D4e59%2D9d16%2D4a8767e54255%2Ddata;fs2;fs2;10.11.113.209;Thu%20Apr%2001%2002%2D20%2D52%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; [root@r-x-master 2021_04_01-02:21:06 restore-10k-files]$


2. Reboot the node where mmxcp is running

[root@r-x-master 2021_04_01-02:21:10 restore-10k-files]$ ps -eaf | grep mmxcp root 3312 7741 0 02:21 pts/1 00:00:00 grep --color=auto mmxcp scalemg+ 28191 30662 0 02:20 ? 00:00:00 /bin/bash -c set -m; sudo mmxcp enable --source 'pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data' --snapshot 'fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79' --target '/mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data' -N 'snapclass' root 28195 28191 0 02:20 ? 00:00:00 sudo mmxcp enable --source pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data --snapshot fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 --target /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data -N snapclass root 28202 28195 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmxcp enable --source pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data --snapshot fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 --target /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data -N snapclass root 28614 28202 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmapplypolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.28202/tmpPolicyFile -N 10.11.113.209 -m 1 --scope=inodespace root 28753 28614 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/tsapolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -m 1 --scope inode-space -g /mnt/fs2/.mmSharedTmpDir -P /var/mmfs/tmp/cmdTmpDir.mmxcp.28202/tmpPolicyFile -I yes -L 1 --enableIPv6 no -N /var/mmfs/tmp/nodefile.mmapplypolicy.28614 -X 10.11.113.188 root 28760 28753 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/tsapolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -m 1 --scope inode-space -g /mnt/fs2/.mmSharedTmpDir -P /var/mmfs/tmp/cmdTmpDir.mmxcp.28202/tmpPolicyFile -I yes -L 1 --enableIPv6 no -N /var/mmfs/tmp/nodefile.mmapplypolicy.28614 -X 10.11.113.188 scalemg+ 31317 30662 0 02:20 ? 00:00:00 /bin/bash -c set -m; sudo mmxcp enable --source 'pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data' --snapshot 'fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79' --target '/mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data' -N 'snapclass' root 31320 31317 0 02:20 ? 00:00:00 sudo mmxcp enable --source pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data --snapshot fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 --target /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data -N snapclass root 31321 31320 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmxcp enable --source pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data --snapshot fs2:pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee:snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 --target /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data -N snapclass root 31946 31321 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmapplypolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.31321/tmpPolicyFile -N 10.11.113.209 -m 1 --scope=inodespace root 32163 31946 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/tsapolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -m 1 --scope inode-space -g /mnt/fs2/.mmSharedTmpDir -P /var/mmfs/tmp/cmdTmpDir.mmxcp.31321/tmpPolicyFile -I yes -L 1 --enableIPv6 no -N /var/mmfs/tmp/nodefile.mmapplypolicy.31946 -X 10.11.113.188 root 32171 32163 0 02:20 ? 00:00:00 /usr/lpp/mmfs/bin/tsapolicy /mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data -m 1 --scope inode-space -g /mnt/fs2/.mmSharedTmpDir -P /var/mmfs/tmp/cmdTmpDir.mmxcp.31321/tmpPolicyFile -I yes -L 1 --enableIPv6 no -N /var/mmfs/tmp/nodefile.mmapplypolicy.31946 -X 10.11.113.188 [root@r-x-master 2021_04_01-02:21:15 restore-10k-files]$ [root@r-x-master 2021_04_01-02:22:17 restore-10k-files]$ find /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data/ -type f -print | wc -l 5646 [root@r-x-master 2021_04_01-02:22:21 restore-10k-files]$ find /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data/ -type f -print | wc -l 5121 [root@r-x-master 2021_04_01-02:22:24 restore-10k-files]$ [root@r-x-master 2021_04_01-02:22:27 restore-10k-files]$ shutdown -r now

Session stopped


3. Verify pvcs go to Bound state post reboot
observed that pvcs remained in pending state. The remaining files were also not getting copied post reboot.
New mmxcp process was not triggered post reboot.

[root@r-x-master 2021_04_01-02:25:41 ~]$ mmdiag --version

=== mmdiag: version === Current GPFS build: "5.1.1.0 210329.152612". Built on Mar 29 2021 at 15:39:34 Running 1 minute 46 secs, pid 4262 [root@r-x-master 2021_04_01-02:25:45 ~]$ [root@r-x-master 2021_04_01-02:25:45 ~]$ mmccr flist version name


    1  ccr.nodes
    1  ccr.disks
    1  mmLockFileDB
    1  genKeyData
    1  genKeyDataNew
   49  mmsdrfs
    1  mmsysmon.json
    1  zmrules.json
    6  collectors
   23  _gui.settings
   13  _gui.user.repo
    1  _callhomeconfig
   67  _gui.keystore_settings
    3  gpfs.install.clusterdefinition.txt
   18  _gui.snapshots
  499  _xcpRunning
  488  _perfmon.keys

[root@r-x-master 2021_04_01-02:25:48 ~]$ [root@r-x-master 2021_04_01-02:25:50 ~]$ mmccr fget _xcpRunning /tmp/2 fget:499 [root@r-x-master 2021_04_01-02:25:55 ~]$ cat /tmp/2 XCP1616732035;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D4141dba5%2De9e1%2D4c27%2D97bd%2Dd941bc67f147%2Fpvc%2D4141dba5%2De9e1%2D4c27%2D97bd%2Dd941bc67f147%2Ddata;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2021%2D13%2D55%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1616732112;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D80a299e6%2D85bb%2D44e2%2D8d50%2D27147a35d289%2Fpvc%2D80a299e6%2D85bb%2D44e2%2D8d50%2D27147a35d289%2Ddata;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2021%2D15%2D12%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1616739442;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fdir4;fs2;fs2;10.11.113.209;Thu%20Mar%2025%2023%2D17%2D22%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1617268833;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D467ecd6c%2D2fe1%2D461e%2Dbe54%2Da601e9d48803%2Fpvc%2D467ecd6c%2D2fe1%2D461e%2Dbe54%2Da601e9d48803%2Ddata;fs2;fs2;10.11.113.209;Thu%20Apr%2001%2002%2D20%2D33%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; XCP1617268852;%2Fmnt%2Ffs2%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2F%2Esnapshots%2Fsnapshot%2D0ac7f259%2D7f4d%2D4ca2%2D81fd%2Dbddbc9d71f79%2Fpvc%2D44e12b16%2D42d0%2D41fb%2Dbe9f%2D1b67b2b64bee%2Ddata;%2Fmnt%2Ffs2%2Fpvc%2D239078f2%2Db0fc%2D4e59%2D9d16%2D4a8767e54255%2Fpvc%2D239078f2%2Db0fc%2D4e59%2D9d16%2D4a8767e54255%2Ddata;fs2;fs2;10.11.113.209;Thu%20Apr%2001%2002%2D20%2D52%202021;RESERVED;RESERVED;RESERVED;RESERVED;RESERVED; [root@r-x-master 2021_04_01-02:25:56 ~]$ [root@r-x-master 2021_04_01-02:25:58 ~]$ diff /tmp/1 /tmp/2 [root@r-x-master 2021_04_01-02:26:01 ~]$ mmxcp list all PARALLEL_COPY_ID:XCP1616732035 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-4141dba5-e9e1-4c27-97bd-d941bc67f147/pvc-4141dba5-e9e1-4c27-97bd-d941bc67f147-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 21-13-55 2021 PARALLEL_COPY_ID:XCP1616732112 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-80a299e6-85bb-44e2-8d50-27147a35d289/pvc-80a299e6-85bb-44e2-8d50-27147a35d289-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 21-15-12 2021 PARALLEL_COPY_ID:XCP1616739442 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/dir4 PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Mar 25 23-17-22 2021 PARALLEL_COPY_ID:XCP1617268833 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Apr 01 02-20-33 2021 PARALLEL_COPY_ID:XCP1617268852 PARALLEL_COPY_SOURCE_PATH:/mnt/fs2/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee/.snapshots/snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79/pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee-data PARALLEL_COPY_TARGET_PATH:/mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data PARALLEL_COPY_SOURCE_DEVICE:fs2 PARALLEL_COPY_TARGET_DEVICE:fs2 PARALLEL_COPY_NODE_LIST:10.11.113.209 PARALLEL_COPY_START_TIME:Thu Apr 01 02-20-52 2021 [root@r-x-master 2021_04_01-02:26:05 ~]$ [root@r-x-master 2021_04_01-02:26:06 ~]$ ps -eaf | grep mmxcp root 22984 11222 0 02:26 pts/0 00:00:00 grep --color=auto mmxcp [root@r-x-master 2021_04_01-02:26:11 ~]$ [root@r-x-master 2021_04_01-02:26:12 ~]$ ssh r-x-worker1 " ps -eaf | grep mmxcp " root 32296 32294 0 02:26 ? 00:00:00 bash -c ps -eaf | grep mmxcp root 32300 32296 0 02:26 ? 00:00:00 grep mmxcp [root@r-x-master 2021_04_01-02:26:18 ~]$ [root@r-x-master 2021_04_01-02:26:20 ~]$ find /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data/ -type f -print | wc -l 5646 (reverse-i-search)t': diff /tmp/1 /^Cp/2 (reverse-i-search)type ': find /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data/ -^Cpe f -print | wc -l [root@r-x-master 2021_04_01-02:26:41 ~]$ [root@r-x-master 2021_04_01-02:27:00 ~]$ find /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data/ -type f -print | wc -l 5646 [root@r-x-master 2021_04_01-02:27:01 ~]$ find /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data/ -type f -print | wc -l 5646 [root@r-x-master 2021_04_01-02:27:06 ~]$ find /mnt/fs2/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803/pvc-467ecd6c-2fe1-461e-be54-a601e9d48803-data/ -type f -print | wc -l 5646 [root@r-x-master 2021_04_01-02:27:13 ~]$ find /mnt/fs2/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255/pvc-239078f2-b0fc-4e59-9d16-4a8767e54255-data/ -type f -print | wc -l 5646 [root@r-x-master 2021_04_01-02:27:14 ~]$

[root@r-x-master 2021_04_01-02:46:00 restore-10k-files]$ knpvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore2-vs1-fs2-million-to-fs3 Bound pvc-c1889426-37b7-4ac7-bcec-e3fb149f0529 20Gi RWX sc4-fs3-million 7d5h pvc1-fs2-million Bound pvc-5e28a55d-ce94-44a7-bfee-9eb0b21fe794 20Gi RWX sc-fs2-million 10d pvc1-fs2-tenk Bound pvc-44e12b16-42d0-41fb-be9f-1b67b2b64bee 5Gi RWX sc-fs2-tenk 10d res21-tenk Pending sc-fs2-snapclass 6d5h res22-tenk Pending sc-fs2-snapclass 6d5h res31-tenk Pending sc-fs2-snapclass 25m res32-tenk Pending sc-fs2-snapclass 25m [root@r-x-master 2021_04_01-02:46:03 restore-10k-files]$ [root@r-x-master 2021_04_01-02:46:03 restore-10k-files]$ kn describe pvc res32-tenk Name: res32-tenk Namespace: ibm-spectrum-scale-csi-driver StorageClass: sc-fs2-snapclass Status: Pending Volume: Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: vs1-fs2-tenk Used By: Events: Type Reason Age From Message


Normal ExternalProvisioning 23m (x9 over 25m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Warning ProvisioningFailed 20m (x7 over 23m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": error getting handle for DataSource Type VolumeSnapshot by Name vs1-fs2-tenk: error getting snapshot vs1-fs2-tenk from api server: Get "https://10.96.0.1:443/apis/snapshot.storage.k8s.io/v1beta1/namespaces/ibm-spectrum-scale-csi-driver/volumesnapshots/vs1-fs2-tenk": dial tcp 10.96.0.1:443: connect: connection refused Normal Provisioning 4m29s (x12 over 25m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/res32-tenk" Warning ProvisioningFailed 4m29s (x4 over 18m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": rpc error: code = Internal desc = snapshot copy job had failed for snapshot: snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 Normal ExternalProvisioning 3m24s (x62 over 18m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator [root@r-x-master 2021_04_01-02:46:04 restore-10k-files]$ [root@r-x-master 2021_04_01-02:46:06 restore-10k-files]$ kn describe pvc res31-tenk Name: res31-tenk Namespace: ibm-spectrum-scale-csi-driver StorageClass: sc-fs2-snapclass Status: Pending Volume: Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: vs1-fs2-tenk Used By: Events: Type Reason Age From Message


Warning ProvisioningFailed 23m spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal ExternalProvisioning 23m (x11 over 25m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Warning ProvisioningFailed 23m (x4 over 23m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": rpc error: code = Aborted desc = volume creation already in process : pvc-467ecd6c-2fe1-461e-be54-a601e9d48803 Warning ProvisioningFailed 19m (x4 over 23m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": error getting handle for DataSource Type VolumeSnapshot by Name vs1-fs2-tenk: error getting snapshot vs1-fs2-tenk from api server: Get "https://10.96.0.1:443/apis/snapshot.storage.k8s.io/v1beta1/namespaces/ibm-spectrum-scale-csi-driver/volumesnapshots/vs1-fs2-tenk": dial tcp 10.96.0.1:443: connect: connection refused Normal ExternalProvisioning 3m45s (x61 over 18m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator Normal Provisioning 20s (x13 over 25m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-driver/res31-tenk" Warning ProvisioningFailed 19s (x4 over 15m) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-0_4c9ddea4-da2a-4ca8-9fc0-7e52d0f2e088 failed to provision volume with StorageClass "sc-fs2-snapclass": rpc error: code = Internal desc = snapshot copy job had failed for snapshot: snapshot-0ac7f259-7f4d-4ca2-81fd-bddbc9d71f79 [root@r-x-master 2021_04_01-02:46:10 restore-10k-files]$


**Expected behavior**
PVCs should go to Bound state.

**Environment**
Please run the following an paste your output here:
``` bash
# Developement
operator-sdk version 
go version

# Deployment
kubectl version
rpm -qa | grep gpfs
[root@r-x-master 2021_04_01-02:47:44 restore-10k-files]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:03:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
[root@r-x-master 2021_04_01-02:47:45 restore-10k-files]$ rpm -qa | grep gpfs
gpfs.gskit-8.0.55-19.x86_64
gpfs.base-5.1.1-0.210329.152612.x86_64
gpfs.compression-5.1.1-0.210329.152612.x86_64
gpfs.gss.pmcollector-5.1.1-0.el7.x86_64
gpfs.bda-integration-1.0.3-0.noarch
gpfs.java-5.1.1-0.210329.152612.x86_64
gpfs.afm.cos-1.0.0-1.x86_64
gpfs.msg.en_US-5.1.1-0.210329.152612.noarch
gpfs.adv-5.1.1-0.210329.152612.x86_64
gpfs.gpl-5.1.1-0.210329.152612.noarch
gpfs.crypto-5.1.1-0.210329.152612.x86_64
gpfs.gss.pmsensors-5.1.1-0.el7.x86_64
gpfs.gui-5.1.1-0..noarch
gpfs.docs-5.1.1-0.210329.152612.noarch
gpfs.license.adv-5.1.1-0.210329.152612.x86_64
gpfs.librdkafka-5.1.1-0.210329.152612.x86_64
[root@r-x-master 2021_04_01-02:47:46 restore-10k-files]$

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

deeghuge commented 8 months ago

@saurabhwani5 please check this one

saurabhwani5 commented 8 months ago

Above issue is not getting recreated and restore is happening after reboot of node

hemalathagajendran commented 8 months ago

@saurabhwani5 please add logs and scenarios tested.