IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
64 stars 50 forks source link

PVC creation taking more time when 50K PVCs are created #993

Open saurabhwani5 opened 1 year ago

saurabhwani5 commented 1 year ago

Describe the bug

Currently, I am conducting tests for the creation of 50,000 PVCs. Out of these, 3,000 are independent PVCs, and the remaining 50,000 are dependent PVCs.

How to Reproduce?

  1. Install to CSI version 2.9.0.
  2. Since PVC creation consumes significant CPU resources, we should reduce the CPU allocation for all sidecars in the deployment and set the replica count of the operator to 0.
    resources:
          limits:
            cpu: 1500m
            ephemeral-storage: 25Gi
            memory: 1500Mi
  3. Create 3000 independent PVCs and associate 20 dependent PVCs with each independent PVC (totaling 60,000 dependent PVCs).
  4. Note that when the number of PVCs increases, it takes more time (around 10 minutes or more) to create each PVC.
    
    [root@saurabh5-master ~]# oc get pods
    NAME                                                  READY   STATUS    RESTARTS        AGE
    ibm-spectrum-scale-csi-2bjh9                          3/3     Running   2 (3h32m ago)   20h
    ibm-spectrum-scale-csi-5qrsp                          3/3     Running   3 (116m ago)    20h
    ibm-spectrum-scale-csi-attacher-79849cffcb-c8kbd      1/1     Running   0               20h
    ibm-spectrum-scale-csi-attacher-79849cffcb-r78rp      1/1     Running   0               20h
    ibm-spectrum-scale-csi-provisioner-6fb458cb77-5npbm   1/1     Running   2 (3h32m ago)   20h
    ibm-spectrum-scale-csi-resizer-78b6699ff4-m7p2w       1/1     Running   0               20h
    ibm-spectrum-scale-csi-snapshotter-59fb55f65b-7vhnk   1/1     Running   0               20h
    [root@saurabh5-master ~]# kubectl top pod --namespace ibm-spectrum-scale-csi-driver
    NAME                                                  CPU(cores)   MEMORY(bytes)
    ibm-spectrum-scale-csi-2bjh9                          2m           50Mi
    ibm-spectrum-scale-csi-5qrsp                          1m           37Mi
    ibm-spectrum-scale-csi-attacher-79849cffcb-c8kbd      1m           379Mi
    ibm-spectrum-scale-csi-attacher-79849cffcb-r78rp      1m           14Mi
    ibm-spectrum-scale-csi-provisioner-6fb458cb77-5npbm   35m          684Mi
    ibm-spectrum-scale-csi-resizer-78b6699ff4-m7p2w       1m           660Mi
    ibm-spectrum-scale-csi-snapshotter-59fb55f65b-7vhnk   1m           18Mi
    [root@saurabh5-master ~]# oc get pvc -A | wc -l
    45120

[root@saurabh5-master ~]# oc get pvc | grep scale-fset-dependent-sc-2107-pvc-1 scale-fset-dependent-sc-2107-pvc-1 Bound pvc-99206a74-3840-47b0-9055-eb217b982bfb 1Gi RWX ibm-spectrum-scale-csi-fileset-dependent-2107 21m


Scripts used : 
1. For creation of independent PVC :

[root@saurabh5-master 50K]# cat sc.yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ibm-spectrum-scale-csi-fileset-independent provisioner: spectrumscale.csi.ibm.com parameters: volBackendFs: "fs1" inodeLimit: "1024" reclaimPolicy: Delete [root@saurabh5-master 50K]# cat pvc.sh

!/bin/bash

for (( i=1 ; i<=$1 ; i++ )) do echo "apiVersion: v1"> test.yaml echo "kind: PersistentVolumeClaim">> test.yaml echo "metadata:">> test.yaml echo " name: scale-fset-independent-pvc-$i">> test.yaml echo "spec:">> test.yaml echo " accessModes:">> test.yaml echo " - ReadWriteMany">> test.yaml echo " resources:">> test.yaml echo " requests:">> test.yaml echo " storage: 1Gi">> test.yaml echo " storageClassName: ibm-spectrum-scale-csi-fileset-independent">> test.yaml kubectl apply -f test.yaml done

2. For creation of dependent pvc : 
First we need to collect all the independent  fileset names of which we will provide it for dependent pvc : 

[root@saurabh5-master dep]# cat 3000sc.sh

!/bin/bash

read -p "Enter the path to the file: " file_path

if [[ ! -f "$file_path" ]]; then echo "File not found: $file_path" exit 1 fi

Read the file line by line

i=1 while IFS= read -r line; do

Process each line

echo "apiVersion: storage.k8s.io/v1"> testsc.yaml
echo "kind: StorageClass">> testsc.yaml
echo "metadata:">> testsc.yaml
echo "  name: ibm-spectrum-scale-csi-fileset-dependent-$i">> testsc.yaml
echo "provisioner: spectrumscale.csi.ibm.com">> testsc.yaml
echo "parameters:">> testsc.yaml
echo "    volBackendFs: fs1">> testsc.yaml
echo "    filesetType: dependent">> testsc.yaml
echo "    parentFileset: $line">> testsc.yaml
echo "reclaimPolicy: Delete">> testsc.yaml
kubectl apply -f testsc.yaml
i=$((i+1))

done < "$file_path"


Creating dependent  50,000 PVCs: (we are creating pvc in batches of 20)

[root@saurabh5-master dep]# cat apply.sh

!/bin/bash

for ((i=1; i<=3000; i++)) do while true; do pending_count=$(oc get pvc | grep Pending | wc -l)

if [ "$pending_count" -eq 0 ]; then
    echo "All PVCs are in a bound state. Proceeding..."
    break
else
    echo "There are $pending_count PVC(s) in a pending state. Waiting..."
    #sleep 10
fi

done for ((j=1; j<=20; j++)) do echo "apiVersion: v1"> test.yaml echo "kind: PersistentVolumeClaim">> test.yaml echo "metadata:">> test.yaml echo " name: scale-fset-dependent-sc-$i-pvc-$j">> test.yaml echo "spec:">> test.yaml echo " accessModes:">> test.yaml echo " - ReadWriteMany">> test.yaml echo " resources:">> test.yaml echo " requests:">> test.yaml echo " storage: 1Gi">> test.yaml echo " storageClassName: ibm-spectrum-scale-csi-fileset-dependent-$i">> test.yaml kubectl apply -f test.yaml done done



## Expected behavior
PVC creation should take less time 

### Data Collection and Debugging
CSI Snap : /scale-csi/D.993
amdabhad commented 12 months ago

While we have the setup where close to 50k fileset based PVCs are created, please capture below things:

  1. Current resources set on all the CSI pods
  2. Resources in use while creating PVC and pod
  3. Time taken for each of the following:
    • mmlsfileset <fs> <existing fileset name> :
      
      [root@saurabh5-scalegui ~]# date; mmlsfileset fs1 check8; date
      Mon Jul 17 23:13:54 PDT 2023

Unable to start tslsfileset on 'fs1' because conflicting program tslsfileset is running. Waiting until it completes or moves to the next phase, which may allow the current command to start. tslsfileset on 'fs1' is finished waiting. Processing continues ... Filesets in file system 'fs1': Name Status Path check8 Linked /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8 Mon Jul 17 23:21:26 PDT 2023

* REST call to GUI for the same mmlsfilest:

[root@saurabh5-master ~]# curl --insecure -u 'username:password' -X GET https://saurabh5-scalegui.fyre.ibm.com:443/scalemgmt/v2/filesystems/fs1/filesets/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75 { "filesets" : [ { "config" : { "comment" : "Fileset created by IBM Container Storage Interface driver", "created" : "2023-06-08 23:42:28,000", "iamMode" : "off", "id" : 2467, "inodeSpace" : 2467, "inodeSpaceMask" : 2096640, "isInodeSpaceOwner" : true, "maxNumInodes" : 1024, "oid" : 5866, "parentId" : 0, "path" : "/ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75", "permissionChangeMode" : "chmodAndSetacl", "rootInode" : 1293418499, "snapId" : 0, "status" : "Linked" }, "filesetName" : "pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75", "filesystemName" : "fs1", "usage" : { "allocatedInodes" : 1024, "inodeSpaceFreeInodes" : 976, "inodeSpaceUsedInodes" : 48, "usedBytes" : 0, "usedInodes" : 48 } } ], "status" : { "code" : 200, "message" : "The request finished successfully." } }

* Create a new fileset using mmcrfileset:

[root@saurabh5-scalegui ~]# date; mmcrfileset fs1 check8 --inode-space pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75; date Mon Jul 17 22:08:53 PDT 2023 Fileset check8 created with id 55000 root inode 1293418544. Mon Jul 17 22:08:54 PDT 2023

* Link fileset using mmlinkfileset:

[root@saurabh5-scalegui ~]# date; mmlinkfileset fs1 check8 -J /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8; date Mon Jul 17 22:12:30 PDT 2023 Fileset check8 linked at /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8 Mon Jul 17 22:12:31 PDT 2023 [root@saurabh5-scalegui ~]#

* Time taken for PVC from create to go to bound state, also the time taken for the same volumeCreate call in driver logs
 * REST call to GUI to create a fileset - for this you will get a job id, keep checking the status of job for the completion using the job id: Took 5 Mins 45 Sec

[root@saurabh5-master ~]# curl --insecure -u 'udername:password' -X GET https://saurabh5-scalegui.fyre.ibm.com:443/scalemgmt/v2/jobs/1000000270533 { "jobs" : [ { "jobId" : 1000000270533, "status" : "COMPLETED", "submitted" : "2023-07-17 07:51:06,090", "completed" : "2023-07-17 07:56:51,061", "runtime" : 344971, "request" : { "data" : { "filesetName" : "check3", "inodeSpace" : "pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75" }, "type" : "POST", "url" : "/scalemgmt/v2/filesystems/fs1/filesets" }, "result" : { "progress" : [ ], "commands" : [ "mmcrfileset 'fs1' 'check3' --inode-space 'pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75' --allow-permission-change 'chmodAndSetAcl' ", "mmlinkfileset 'fs1' 'check3' -J '/ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check3' " ], "stdout" : [ "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSG0070I File set check3 created successfully.", "EFSSG0078I File set check3 successfully linked.\n" ], "stderr" : [ ], "exitCode" : 0 }, "pids" : [ ] } ], "status" : { "code" : 200, "message" : "The request finished successfully." } }

amdabhad commented 10 months ago

A RTC issue is opened for this - https://jazz07.rchland.ibm.com:21443/jazz/web/projects/GPFS#action=com.ibm.team.workitem.viewWorkItem&id=316467