Closed Tristan-Le1 closed 7 months ago
This issue is due to mmxcp failure, and there is a known issue https://github.com/IBM/ibm-spectrum-scale-csi/issues/540 on this - cloning fails when mmxcp fails, user has to delete the PVC and retry cloning in this case.
checking more on why mmxcp is failing with:
[EFSSA0069C Command execution error: [E] Summary of errors:: _bunches of PDRs with errors:2.
In attempting to delete the offending PVC, I was able to delete the stuck Pending and Bound cloned PVC, but the original PVC (that these were cloned from, 280-f-fileset-dplmnt1-pvc1) is stuck in Terminating state for over 20 minutes.
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
280-f-namespace 280-f-fileset-dplmnt1-pvc1 Terminating pvc-b190dbe6-9a96-4b26-9ed7-f37d2dea46cb 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt1-pvc2 Bound pvc-5428166e-296e-44bc-95c8-af58758c0c4a 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt2-pvc1 Bound pvc-92001e9b-3d28-4ba3-8f4b-75023a7138a0 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt2-pvc2 Bound pvc-0026f214-b93f-4603-933c-19bdba5ab279 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt3-pvc1 Bound pvc-2abc485d-d483-4d92-8f5b-48364c5283ab 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt3-pvc2 Bound pvc-8bae9e63-22b3-4c07-883b-70a738e962ec 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt4-pvc1 Bound pvc-c722c0e0-496d-4b02-a536-1d5d4a6ff5ce 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt4-pvc2 Bound pvc-54cb8e3e-82c5-412b-b5c1-9a788f776ed0 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt5-pvc1 Bound pvc-f528dafb-82e2-4759-8395-bb51b994edad 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
280-f-namespace 280-f-fileset-dplmnt5-pvc2 Bound pvc-de34b058-ca82-4c42-b39b-1a9c54f287e4 1Gi RWX 280-f-fileset-csi-spectrum-scale 24h
Is there a suggested course of action here?
It should get deleted in sometime, unless a pod is using that PVC, where you need to delete the pod first and then PVC.
I was able to delete the offending PVC. And then there was the same issue again with a different PVC. I was able to delete that one too. And then cloning worked how it was supposed to. Each time the offending PVC took a long time to delete and the cloned PVC deleted like normal. Am finishing up the cloning test now, after getting through the errors.
Checked with Dan McNichol: The above mmxcp error was due to GPFS was down for a bit on a node while running the mmxcp job.
@Tristan-Le1 , can you please copy the following on some path at/u/DUMPS/
and please update the path in the issue, thank you!
The requested files and outputs are resident on the system (Archie), PWD: /u/DUMPS/CSI_ISSUE_843
Just something to note, the cso yaml still came out as short as it did before.
@Tristan-Le1 could you please help revisit this and see if this issue is still valid ?
Just noting here so everyone is updated, Abhishek and I have conferred via email on this issue. The issue is to be fixed in a later iteration of CSI. When that becomes available, I will attempt to recreate and close issue.
Readjusting the labels as there is workaround documented on this - https://www.ibm.com/docs/en/spectrum-scale-csi?topic=troubleshooting-debugging-pvc-pending-state-issues-while-creating-multiple-volume-clones and also to match with another issue https://github.com/IBM/ibm-spectrum-scale-csi/issues/540 @Jainbrt please revert if disagree.
Closing since no updates since long
Bug Description
10 fileset PVC were created successfully and are in Bound state.
PVC cloning was working. Note that three PVCs cloned from 280-f-fileset-dplmnt1-pvc1 named 280-f-fileset-dplmnt1-pvc1-clone-(number) are Bound.
The fourth cloned fileset PVC (280-f-fileset-dplmnt1-pvc1-clone-4) has been stuck in Pending state for over 2 hours.
A describe of the stuck PVC clone shows these events:
To Reproduce
Scripts used to create PVC are resident on the system (Archie), PWD: /ibm/fs0/real-world-tests/launchers/creation_of_pvc_and_deployments
The script can be launched to create fileset PVC with a command as such: ./launch_creation_of_pvc_and_deployments.sh -f fs0 -g (GUI IP) -u (GUI Username):(GUI Password) -P fileset-10-Mi -t blast-write-loop -n 280-f -d 5 -p 2 -s t
Scripts used to clone PVC are resident on the system (Archie), PWD: /ibm/fs0/real-world-tests/launchers/cloning
The script can be launched to clone PVC with a command as such: ./clone_pvc_from_pvc.sh -n 280-f -c 280-f-fileset-csi-spectrum-scale -N 30 -P 10
Expected Behavior
The cloned PVC should become Bound.
Environment
Scale State:
Scale Health:
Kubernetes State:
CSI State:
Red Hat Version, Kernel, and Scale Version:
Additional Context
A snap of the logs is resident on the system (Archie), PWD: /ibm/fs0/CSI/CSI-2.8.0-301122/ibm-spectrum-scale-csi/tools/ibm-spectrum-scale-csi-logs_12-01-2022-13\:48\:09/