IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

PVCs are in Pending state when multiple PVCs are created simultaneously #1038

Closed saurabhwani5 closed 1 year ago

saurabhwani5 commented 1 year ago

Describe the bug

Multiple PVCs are seen in pending state where in provisioner logs are :Waited for 194.299286ms due to client-side throttling, not priority and fairness, request: PATCH:[https://172.30.0.1:443/api/v1/namespaces/ibm-spectrum-scale-csi-namespace-523/events/ibm-spectrum-scale-csi-pvc-4.178951209adf7bb5](https://172.30.0.1/api/v1/namespaces/ibm-spectrum-scale-csi-namespace-523/events/ibm-spectrum-scale-csi-pvc-4.178951209adf7bb5)

How to Reproduce?

  1. Create a 2500 namespaces so that we can create multiple filesets of version 2 PVCs
    
    [OCP saurabh]# oc get ns | grep ibm-spectrum-scale-csi-namespace | wc  -l
    2500
    Script to create 2500 ns :
    #!/bin/bash

create_pvc_yaml() { local ns_name="ibm-spectrum-scale-csi-namespace-$1" cat < "test.yaml" apiVersion: v1 kind: Namespace metadata: name: $ns_name labels: ibm-spectrum-scale-csi-test: "true" EOF }

read -p "number of namespace to create" num_pvc

for ((i = 1; i <= num_pvc; i++)); do create_pvc_yaml "$i" kubectl apply -f "test.yaml" done

echo "All namespace are created"

2. Create 5 version 2 PVCs in 601 namespaces (601*5=3005 PVCs) :

[OCP saurabh]# oc get pvc -A | grep ibm-spectrum-scale-csi-pvc | wc -l 3005 [OCP saurabh]# oc get pvc -A | grep ibm-spectrum-scale-csi-pvc| grep Bound | wc -l 2496 [OCP saurabh]# oc get pvc -A | grep ibm-spectrum-scale-csi-pvc| grep Pending | wc -l 509

Script for PVC:

!/bin/bash

create_pvc_yaml() { local pvc_name="ibm-spectrum-scale-csi-pvc-$1" cat < "testpvc.yaml" apiVersion: v1 kind: PersistentVolumeClaim metadata: name: $pvc_name namespace: "ibm-spectrum-scale-csi-namespace-$2" spec: accessModes:

read -p "number of namespace in which you want to create 5 pvc each" number_namespace echo "ns $number_namespace" for ((i = 601; i <= number_namespace; i++)); do echo "ns $i" for ((j = 1; j <= 5; j++)); do create_pvc_yaml $j $i kubectl apply -f "testpvc.yaml" done done

echo "All pvc are created"

3. Attach 500 pods to PVCs(1 pod per pvc) and write data in same simultaneously following yaml:

[OCP saurabh]# oc get pods -A | grep io-test-source-pod | wc -l 500 [OCP saurabh]# oc get pods -A | grep io-test-source-pod | grep Running | wc -l 495 [OCP saurabh]# oc get pods -A | grep io-test-source-pod | wc -l 500 [OCP saurabh]# oc get pods -A | grep io-test-source-pod | grep Pending | wc -l 5 [OCP saurabh]# oc get pods -A | grep io-test-source-pod | grep Pending ibm-spectrum-scale-csi-namespace-375 io-test-source-pod1-3 0/1 Pending 0 3d11h ibm-spectrum-scale-csi-namespace-384 io-test-source-pod1-3 0/1 Pending 0 3d12h ibm-spectrum-scale-csi-namespace-385 io-test-source-pod1-4 0/1 Pending 0 3d11h ibm-spectrum-scale-csi-namespace-393 io-test-source-pod1-2 0/1 Pending 0 3d11h ibm-spectrum-scale-csi-namespace-395 io-test-source-pod1-5 0/1 Pending 0 3d11h [OCP saurabh]#

As seen above 5 pods are in Pending state because PVCs are not bound to them

4. Check the Pending PVCs description :
[OCP saurabh]# oc get pvc -A grep Pending wc -l 504 [OCP saurabh]# oc describe pvc ibm-spectrum-scale-csi-pvc-5 -n ibm-spectrum-scale-csi-namespace-600 Name: ibm-spectrum-scale-csi-pvc-5 Namespace: ibm-spectrum-scale-csi-namespace-600 StorageClass: ibm-spectrum-scale-csi-version2 Status: Pending Volume: Labels: Annotations: volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com volume.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Used By: Events: Type Reason Age From Message

Warning ProvisioningFailed 34m (x1082 over 3d16h) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-b6566dbd4-469q2_8d9364da-4d57-46eb-9ef5-3a64b5b95714 failed to provision volume with StorageClass "ibm-spectrum-scale-csi-version2": rpc error: code = Aborted desc = volume creation already in process : pvc-49bd8385-7831-4385-a494-082bfeab7493 Normal Provisioning 4m43s (x1133 over 3d17h) spectrumscale.csi.ibm.com_ibm-spectrum-scale-csi-provisioner-b6566dbd4-469q2_8d9364da-4d57-46eb-9ef5-3a64b5b95714 External provisioner is provisioning volume for claim "ibm-spectrum-scale-csi-namespace-600/ibm-spectrum-scale-csi-pvc-5" Normal ExternalProvisioning 2m24s (x22532 over 3d19h) persistentvolume-controller waiting for a volume to be created, either by external provisioner "spectrumscale.csi.ibm.com" or manually created by system administrator [OCP saurabh]# [OCP saurabh]# oc get pvc ibm-spectrum-scale-csi-pvc-5 -n ibm-spectrum-scale-csi-namespace-600 -oyaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"ibm-spectrum-scale-csi-pvc-5","namespace":"ibm-spectrum-scale-csi-namespace-600"},"spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"ibm-spectrum-scale-csi-version2"}} volume.beta.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com volume.kubernetes.io/storage-provisioner: spectrumscale.csi.ibm.com creationTimestamp: "2023-09-29T07:25:54Z" finalizers:

Observation : These all PVCs are created at same time , may be this is some retry timeout as I have created and tried new PVCs which are getting bound properly

[OCP saurabh]# oc get pvc
NAME                           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                      AGE
ibm-spectrum-scale-csi-pvc-1   Bound    pvc-3e9c1a46-9fc2-4584-943e-f80fd2c71f9f   100Gi      RWX            ibm-spectrum-scale-csi-version2   21m
ibm-spectrum-scale-csi-pvc-2   Bound    pvc-03a8dbbe-17ac-45e7-88cc-1b36e02a97f3   100Gi      RWX            ibm-spectrum-scale-csi-version2   21m
ibm-spectrum-scale-csi-pvc-3   Bound    pvc-a9d98da8-9c1e-401f-b131-7a00de372e47   100Gi      RWX            ibm-spectrum-scale-csi-version2   21m
ibm-spectrum-scale-csi-pvc-4   Bound    pvc-66eb94e6-677b-46a2-b6bb-57f406312e0b   100Gi      RWX            ibm-spectrum-scale-csi-version2   21m
ibm-spectrum-scale-csi-pvc-5   Bound    pvc-8df4f41e-61f7-4714-b679-b4dccc3af071   100Gi      RWX            ibm-spectrum-scale-csi-version2   21m
[OCP saurabh]#

Expected behavior

All PVCs should be in bound state

Logs : Mustgather: /scale-csi/D.1038

Jainbrt commented 1 year ago

Observation:

we are returning "volume creation already in process" from CSI internal map without checking on the Storage Scale server.

I1004 08:06:26.593089       1 controllerserver.go:722] [fa47a944-d317-4a81-9ed7-f02a01ffae4f] volume:[pvc-94cb866a-5738-4e3f-87c8-53935a81fa65] -  IBM Storage Scale volume create params
 : &{pvc-94cb866a-5738-4e3f-87c8-53935a81fa65 107374182400 ibmspectrum-fs true     15203589123160586942  10000 0xc0000fa5f0 0xc0000fa5f0 primary-fileset-ibmspectrum-fs-15203589123160586
942/.volumes /mnt/ibmspectrum-fs/primary-fileset-ibmspectrum-fs-15203589123160586942/.volumes ibmspectrum-fs /mnt/ibmspectrum-fs  ibmspectrum-fs     1 c35a1dd0-9ae6-4dbf-ab3c-f7192c4d38
93-ibm-spectrum-scale-csi-namespace-535   false}
E1004 08:06:26.593109       1 controllerserver.go:743] [fa47a944-d317-4a81-9ed7-f02a01ffae4f] volume:[pvc-94cb866a-5738-4e3f-87c8-53935a81fa65] - volume creation already in process
E1004 08:06:26.593116       1 utils.go:69] [fa47a944-d317-4a81-9ed7-f02a01ffae4f] GRPC error: rpc error: code = Aborted desc = volume creation already in process : pvc-94cb866a-5738-4e3
f-87c8-53935a81fa65
I1004 08:06:26.593128       1 utils.go:79] [fa47a944-d317-4a81-9ed7-f02a01ffae4f] Time taken to execute /csi.v1.Controller/CreateVolume request(in milliseconds): 18
amdabhad commented 1 year ago

Logs are lost and are not sufficient, please recreate the issue with persistent enabled.

amdabhad commented 1 year ago

A GUI issue created for this: https://github.ibm.com/IBMSpectrumScale/scale-core/issues/6137

amdabhad commented 1 year ago

There is some issue in GUI about the job states and with a fix from GUI for above defect, this should work fine.