Closed kulkarnicr closed 8 months ago
I also observed that for 1 million file restore snapshot operation, mmapplypolicy command is executed with only one node (-N option of mmapplypolicy was run with only one node).
My understanding was that if filesystem is mounted on multiple nodes then we'll utilize mmxcp's parallel copy behavior (by giving multiple nodes to -N option).
Mar 11 04:05:07 t-x-master mmfs[23166]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs3 pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1600000:1600000 --allow-permission-change chmodAndSetAcl' RC=0
Mar 11 04:05:08 t-x-master systemd: Started Session c129270 of user root.
Mar 11 04:05:09 t-x-master systemd: Started Session c129271 of user root.
Mar 11 04:05:09 t-x-master mmfs[23391]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs3 pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 -J /mnt/fs3/pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4' RC=0
Mar 11 04:05:10 t-x-master systemd: Started Session c129272 of user root.
Mar 11 04:05:10 t-x-master systemd: Started Session c129273 of user root.
Mar 11 04:05:12 t-x-master systemd: Started Session c129274 of user root.
Mar 11 04:05:12 t-x-master systemd: Started Session c129275 of user root.
Mar 11 04:05:13 t-x-master systemd: Started Session c129276 of user root.
Mar 11 04:05:13 t-x-master mmfs[23756]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs3:pvc-92fdea41-7dc3-4510-97f1-16fe09ec6bb4 --block 21474836480:21474836480' RC=0
Mar 11 04:05:24 t-x-master mmfs[25262]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.24664/tmpPolicyFile -N 10.11.82.111 --scope=inodespace'
...
Mar 11 04:07:32 t-x-master systemd: Started Session c129315 of user root.
Mar 11 04:07:40 t-x-master mmfs[11822]: REST-CLI root admin [EXIT, CHANGE] 'mmcrfileset fs3 pvc-631c2241-230b-487a-94a6-336bbd5d4103 -t Fileset created by IBM Container Storage Interface driver --inode-space new --inode-limit 1600000:1600000 --allow-permission-change chmodAndSetAcl' RC=0
Mar 11 04:07:42 t-x-master systemd: Started Session c129316 of user root.
Mar 11 04:07:44 t-x-master systemd: Started Session c129317 of user root.
Mar 11 04:07:45 t-x-master mmfs[14421]: REST-CLI root admin [EXIT, CHANGE] 'mmlinkfileset fs3 pvc-631c2241-230b-487a-94a6-336bbd5d4103 -J /mnt/fs3/pvc-631c2241-230b-487a-94a6-336bbd5d4103' RC=0
Mar 11 04:07:46 t-x-master systemd: Started Session c129318 of user root.
Mar 11 04:07:48 t-x-master systemd: Started Session c129319 of user root.
Mar 11 04:07:49 t-x-master systemd: Started Session c129320 of user root.
Mar 11 04:07:49 t-x-master systemd: Started Session c129321 of user root.
Mar 11 04:07:50 t-x-master systemd: Started Session c129322 of user root.
Mar 11 04:07:51 t-x-master mmfs[17320]: REST-CLI root admin [EXIT, CHANGE] 'mmsetquota fs3:pvc-631c2241-230b-487a-94a6-336bbd5d4103 --block 21474836480:21474836480' RC=0
Mar 11 04:08:18 t-x-master mmfs[31533]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /mnt/fs2/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360/.snapshots/snapshot-ac78da22-4977-4946-ba8c-98547f82cc0b/pvc-e5766ec9-080b-4d2f-b24c-feaa64717360-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.22427/tmpPolicyFile -N 10.11.82.111 --scope=inodespace'
filesystem (fs3) was mounted on all nodes
[root@t-x-master 2021_03_11-18:25:17 tmp]$ mmlsmount fs3
File system fs3 is mounted on 3 nodes.
[root@t-x-master 2021_03_11-18:25:26 tmp]$
[root@t-x-master 2021_03_11-18:25:27 tmp]$ mmdsh -N all " df -h | grep fs3 "
t-x-worker2.fyre.ibm.com: fs3 100G 57G 44G 57% /mnt/fs3
t-x-worker1.fyre.ibm.com: fs3 100G 57G 44G 57% /mnt/fs3
t-x-master.fyre.ibm.com: fs3 100G 57G 44G 57% /mnt/fs3
[root@t-x-master 2021_03_11-18:25:36 tmp]$
verified the issue with following builds and it is not reproducible.
build and test cluster OS
quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:dev
quay.io/smitaraut/ibm-spectrum-scale-csi-driver:389_x_0315
on Ubuntu20.04
test steps
================================================
Tue 16 Mar 2021 03:01:09 AM PDT PVC State
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fix-restore2-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 6h
Tue 16 Mar 2021 03:01:09 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 03:23:48 AM PDT PVC State
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fix-restore3-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 6h18m
Tue 16 Mar 2021 03:23:48 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 05:43:39 AM PDT PVC State
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fix-restore2-vs1-fs2-million-to-fs3 Bound pvc-f6728087-576c-4830-9856-e32b8ef68408 20Gi RWX sc4-fs3-million 8h
Tue 16 Mar 2021 05:43:39 AM PDT PVC Filecount
1048576 1048576 7277504
================================================
Tue 16 Mar 2021 06:00:03 AM PDT PVC State
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
fix-restore3-vs1-fs2-million-to-fs3 Bound pvc-1abbc71d-72c0-4d7f-ab39-03ffea557e72 20Gi RWX sc4-fs3-million 8h
Tue 16 Mar 2021 06:00:03 AM PDT PVC Filecount
1048576 1048576 7277504
Tried out an extra test. If driver pod remains down till mmapplypolicy/mmxcp finishes and later driver pod comes up, then I observed that another new instance of mmapplypolicy is triggered for same data.
Test steps
1. started restore using only one node (used nodeclass)
2. marked scale=false for the node that was used for mmapplypolicy/mmxcp, it caused driver pod to go away on that node
3. did wait for mmapplypolicy/mmxcp to complete , all data was copied to destination pvc...mmapplypolicy was completed with RC=0, all mmxcp processes finished
4. marked back scale=true for the node, observed that another instance of mmapplypolicy/mmxcp was triggered. I saw that files kind of getting overwritten. When I did stat on the files, only ctime was changed and atime,mtime was same as it was after the 1st mmapplypolicy run from step 3.
Test log
[root@swiftest1 2021_03_17-23:06:27 test_snapshot]$ kn get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ibm-spectrum-scale-csi-66bvv 2/2 Running 0 2d1h 10.11.52.24 swiftest2.fyre.ibm.com <none> <none>
ibm-spectrum-scale-csi-attacher-0 1/1 Running 3 6d22h 10.244.162.251 swiftest2.fyre.ibm.com <none> <none>
ibm-spectrum-scale-csi-j8phx 2/2 Running 0 2d1h 10.11.52.25 swiftest3.fyre.ibm.com <none> <none>
ibm-spectrum-scale-csi-operator-8cb5f8c47-vp6n5 1/1 Running 0 2d17h 10.244.162.194 swiftest2.fyre.ibm.com <none> <none>
ibm-spectrum-scale-csi-provisioner-0 1/1 Running 3 6d22h 10.244.185.41 swiftest3.fyre.ibm.com <none> <none>
ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 3 6d22h 10.244.162.252 swiftest2.fyre.ibm.com <none> <none>
[root@swiftest1 2021_03_17-23:06:29 test_snapshot]$ kubectl label node swiftest3.fyre.ibm.com scale=false --overwrite=true
node/swiftest3.fyre.ibm.com labeled
Mar 17 23:05:53 swiftest4 mmfs[3194055]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3193668/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' Mar 17 23:06:13 swiftest4 systemd[1]: Started Session 7041 of user root. Mar 17 23:06:13 swiftest4 systemd[1]: session-7041.scope: Succeeded. Mar 17 23:06:18 swiftest4 systemd[1]: Started Session 7042 of user root. Mar 17 23:06:19 swiftest4 systemd[1]: session-7042.scope: Succeeded. Mar 17 23:06:56 swiftest4 systemd[1]: Started Session 7043 of user root. Mar 17 23:06:56 swiftest4 systemd[1]: session-7043.scope: Succeeded. Mar 17 23:10:09 swiftest4 mmfs[3195644]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3193668/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' RC=0
[root@swiftest1 2021_03_17-23:10:34 test_snapshot]$ kubectl label node swiftest3.fyre.ibm.com scale=true --overwrite=true node/swiftest3.fyre.ibm.com labeled
[root@swiftest1 2021_03_17-23:11:17 test_snapshot]$ kn get pods NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-csi-66bvv 2/2 Running 0 2d1h ibm-spectrum-scale-csi-attacher-0 1/1 Running 3 6d22h ibm-spectrum-scale-csi-b6l6q 2/2 Running 0 3s ibm-spectrum-scale-csi-operator-8cb5f8c47-vp6n5 1/1 Running 0 2d18h ibm-spectrum-scale-csi-provisioner-0 1/1 Running 4 6d22h ibm-spectrum-scale-csi-snapshotter-0 1/1 Running 3 6d22hMar 17 23:11:24 swiftest4 mmfs[3196135]: REST-
CLI root admin [EXIT, CHANGE] 'mmsetquota fs1:pvc-5edda101-edaf-49e8-aa9c-e9e5a140ccd7 --block 26843545600:26843545600' RC=0 Mar 17 23:11:28 swiftest4 mmfs[3196724]: REST-CLI root admin [ENTRY, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3196193/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' Mar 17 23:13:11 swiftest4 systemd[1]: Started Session 7044 of user root. Mar 17 23:13:11 swiftest4 systemd[1]: session-7044.scope: Succeeded. Mar 17 23:14:26 swiftest4 systemd[1]: Started Session 7045 of user root. Mar 17 23:14:26 swiftest4 systemd[1]: session-7045.scope: Succeeded. Mar 17 23:15:01 swiftest4 CRON[3198714]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Mar 17 23:15:22 swiftest4 mmfs[3198828]: REST-CLI root admin [EXIT, CHANGE] 'mmapplypolicy /ibm/fs1/pvc-342c1138-9c66-435a-9aff-70870f9737dd/.snapshots/snapshot-934d0f9a-be09-4721-aa63-9b56ca17e9b1/pvc-342c1138-9c66-435a-9aff-70870f9737dd-data -P /var/mmfs/tmp/cmdTmpDir.mmxcp.3196193/tmpPolicyFile -N 10.11.52.25 --scope=inodespace' RC=0
verified with CSI 2.11.0 and issue is not getting recreated
@saurabhwani5 Please add logs and scenarios tested.
Describe the bug restarting driver pods during 1 million file restore snapshot operation keeps new pvcs in Pending state.
I kept debug data for this issue on test cluster system. Please feel free to access the test system for live debugging. t-x-master.fyre.ibm.com:~/pvc-pending-when-driver-pods-restarted/
To Reproduce Steps to reproduce the behavior:
create pvc, add 1 million files to it, create a snapshot of this pvc i.e. snapshot has 1 million files
restore 1 million file snapshot to two new pvcs simultaneously
when restore snapshot operation is in progress (i.e. mmapplypolicy/mmxcp yet to finish), restart the driver pods - ensure the pods are up after restart
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore2-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 3h46m Thu Mar 11 07:51:40 PST 2021 PVC Filecount 1048576 1048576 7277504
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore3-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 3h44m Thu Mar 11 07:51:40 PST 2021 PVC Filecount 1048576 1048576 7277504
[root@t-x-master 2021_03_11-17:52:45 ~]$ knpvc fix-restore2-vs1-fs2-million-to-fs3 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore2-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 13h [root@t-x-master 2021_03_11-17:52:50 ~]$ [root@t-x-master 2021_03_11-17:52:51 ~]$ knpvc fix-restore3-vs1-fs2-million-to-fs3 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE fix-restore3-vs1-fs2-million-to-fs3 Pending sc4-fs3-million 13h
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Please note that restoring 1 million file snapshot to two pvcs simultaneously worked fine (all files copied + pvcs went to Bound state) if driver pods were not restarted. It is verified in issue #366. When driver pods are restarted, restored pvcs don't go to Bound i.e. usable state as mentioned in this issue.