IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

Time Required to create version 2 PVC is more when discover_cg_fileset is enabled #1012

Closed saurabhwani5 closed 7 months ago

saurabhwani5 commented 1 year ago

Describe the bug

When there are 100 Independent version 1 PVCs/ Fileset already present , pvc creation of version2 takes more time when VAR_DRIVER_DISCOVER_CG_FILESET: ENABLED

How to Reproduce?

  1. Install CSI with Images #1004 having VAR_DRIVER_DISCOVER_CG_FILESET: ENABLED
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc get pods
    NAME                                                  READY   STATUS    RESTARTS   AGE
    ibm-spectrum-scale-csi-attacher-c6694f984-9g29s       1/1     Running   0          71m
    ibm-spectrum-scale-csi-attacher-c6694f984-xrkbg       1/1     Running   0          71m
    ibm-spectrum-scale-csi-g95w2                          3/3     Running   0          71m
    ibm-spectrum-scale-csi-njkk6                          3/3     Running   0          71m
    ibm-spectrum-scale-csi-operator-7c5df54f55-f8749      1/1     Running   0          4h11m
    ibm-spectrum-scale-csi-provisioner-b4bc74f59-msl6g    1/1     Running   0          71m
    ibm-spectrum-scale-csi-resizer-f95565954-zf7kq        1/1     Running   0          71m
    ibm-spectrum-scale-csi-snapshotter-5944fbbfdf-nhj7g   1/1     Running   0          71m
    ibm-spectrum-scale-csi-x6mg5                          3/3     Running   0          71m
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]#
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]#
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc get cso
    NAME                     VERSION   SUCCESS
    ibm-spectrum-scale-csi   2.10.0    True
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc exec -it ibm-spectrum-scale-csi-g95w2 -- env
    PATH=/chroot:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    TERM=xterm
    HOSTNAME=worker2.saurabh56.cp.fyre.ibm.com
    NSS_SDB_USE_CACHE=no
    master0.saurabh56.cp.fyre.ibm.com=master0
    SHORTNAME_NODE_MAPPING=yes
    DISCOVER_CG_FILESET=ENABLED
  2. Create Version 1 Independent PVC as this will 100 Filesets
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc get pvc | grep ibm-spectrum-scale-csi-fileset-independent -c
    100
  3. Create a CG Fileset :
    Command to create CG fileset is mentioned below 
    [root@worker0 local-sample]# ls | grep default
    31223663-891c-42dd-b826-000000000005-default
  4. Check Time required to create Version 2 PVC to check the time when CG fileset is already Present
    (Note : In this case I'm considering time required to create second PVC onwards for all of the below example as after applying optional cm it takes more time for first PVC creation)
    
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc apply -f apply.yaml
    Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "web-server" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "web-server" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "web-server" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "web-server" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    pod/csi-scale-fsetdemo-pod created
    persistentvolumeclaim/scale-advance-pvc created
    storageclass.storage.k8s.io/ibm-spectrum-scale-csi-advance created
    [root@api.saurabh56.cp.fyre.ibm.com pr1004]# cat apply.yaml
    apiVersion: v1
    kind: Pod
    metadata:
    name: csi-scale-fsetdemo-pod
    labels:
    app: nginx
    spec:
    containers:
    - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /usr/share/nginx/html/scale
     ports:
     - containerPort: 80
    volumes:
    - name: mypvc
     persistentVolumeClaim:
       claimName: scale-advance-pvc
       readOnly: false
    ---

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: scale-advance-pvc spec: accessModes:


apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ibm-spectrum-scale-csi-advance provisioner: spectrumscale.csi.ibm.com parameters: volBackendFs: "local-sample" version: "2" reclaimPolicy: Delete scale-advance-pvc Pending pvc-5c850e74-60e5-458d-a54b-336790b9cb44 0 ibm-spectrum-scale-csi-advance 15s scale-advance-pvc Bound pvc-5c850e74-60e5-458d-a54b-336790b9cb44 1Gi RWX ibm-spectrum-scale-csi-advance 15s [root@worker0 31223663-891c-42dd-b826-000000000005-default]# ls pvc-5c850e74-60e5-458d-a54b-336790b9cb44 [root@worker0 31223663-891c-42dd-b826-000000000005-default]# pwd /mnt/local-sample/31223663-891c-42dd-b826-000000000005-default

Time Required is 15 sec 
6. Check Time required to create Version 2 PVC to check the time when CG fileset is not Present :

scale-advance-pvc Pending pvc-e3e24667-68bf-46cf-9958-48dc74e8959b 0 ibm-spectrum-scale-csi-advance 24s scale-advance-pvc Bound pvc-e3e24667-68bf-46cf-9958-48dc74e8959b 1Gi RWX ibm-spectrum-scale-csi-advance 24s

Time Required is 24s
7. Apply CM as following to make it disabled:

[root@api.saurabh56.cp.fyre.ibm.com pr1004]# cat cm.yaml kind: ConfigMap apiVersion: v1 metadata: name: ibm-spectrum-scale-csi-config namespace: ibm-spectrum-scale-csi data: VAR_DRIVER_DISCOVER_CG_FILESET: DISABLED [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc apply -f cm.yaml configmap/ibm-spectrum-scale-csi-config created [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc project ibm-spectrum-scale-csi Now using project "ibm-spectrum-scale-csi" on server "https://api.saurabh56.cp.fyre.ibm.com:6443". [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc get pods NAME READY STATUS RESTARTS AGE ibm-spectrum-scale-csi-attacher-79988f966c-8lpn6 1/1 Running 0 63s ibm-spectrum-scale-csi-attacher-79988f966c-bcjvq 1/1 Running 0 63s ibm-spectrum-scale-csi-operator-7c5df54f55-f8749 1/1 Running 0 4h39m ibm-spectrum-scale-csi-provisioner-849d44564d-cv5h7 1/1 Running 0 63s ibm-spectrum-scale-csi-resizer-97b5fdfff-vwsrn 1/1 Running 0 63s ibm-spectrum-scale-csi-snapshotter-6fd4dc599d-rmhz2 1/1 Running 0 63s ibm-spectrum-scale-csi-srf2l 3/3 Running 0 56s ibm-spectrum-scale-csi-v67n6 3/3 Running 0 59s ibm-spectrum-scale-csi-wxrpm 3/3 Running 0 62s [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc get cso NAME VERSION SUCCESS ibm-spectrum-scale-csi 2.10.0 True [root@api.saurabh56.cp.fyre.ibm.com pr1004]# oc exec -it ibm-spectrum-scale-csi-srf2l -- env PATH=/chroot:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin TERM=xterm HOSTNAME=worker0.saurabh56.cp.fyre.ibm.com NSS_SDB_USE_CACHE=no NODEPUBLISH_METHOD=BINDMOUNT CSI_ENDPOINT=unix:///var/lib/kubelet/plugins/spectrumscale.csi.ibm.com/csi.sock worker0.saurabh56.cp.fyre.ibm.com=worker0 worker2.saurabh56.cp.fyre.ibm.com=worker2 master0.saurabh56.cp.fyre.ibm.com=master0 PERSISTENT_LOG=DISABLED worker1.saurabh56.cp.fyre.ibm.com=worker1 SHORTNAME_NODE_MAPPING=yes IS_OpenShift=True VOLUME_STATS_CAPABILITY=ENABLED KUBELET_ROOT_DIR_PATH=/var/lib/kubelet SKIP_MOUNT_UNMOUNT=yes NODE_ID=worker0.saurabh56.cp.fyre.ibm.com master1.saurabh56.cp.fyre.ibm.com=master1 CSI_CG_PREFIX=31223663-891c-42dd-b826-3be564ddea55 DISCOVER_CG_FILESET=DISABLED

8. Check Time required to create Version 2 PVC to check the time when CG fileset is already Present  :

[root@worker0 local-sample]# ls | grep default 31223663-891c-42dd-b826-000000000005-default scale-advance-pvc Pending pvc-1d1281fc-23bb-4d14-afa2-58ed00bd07d3 0 ibm-spectrum-scale-csi-advance 15s scale-advance-pvc Bound pvc-1d1281fc-23bb-4d14-afa2-58ed00bd07d3 1Gi RWX ibm-spectrum-scale-csi-advance 15s

Time required is 15 sec

9. Check Time required to create Version 2 PVC to check the time when CG fileset is not Present 

scale-advance-pvc Pending pvc-7875e517-f008-4f2d-8df1-8a28278a15db 0 ibm-spectrum-scale-csi-advance 12s scale-advance-pvc Bound pvc-7875e517-f008-4f2d-8df1-8a28278a15db 1Gi RWX ibm-spectrum-scale-csi-advance 12s

Time required is 15 sec 
Command to Create CG fileset : 

mmcrfileset local-sample 31223663-891c-42dd-b826-000000000005-default --inode-space=new -t "Fileset created by IBM Container Storage Interface driver" mmlinkfileset local-sample 31223663-891c-42dd-b826-000000000005-default



## Expected behavior
Time required should be reduced when there discover_cg_fileset is enabled and there is no cg fileset present as it checks for all filesets
saurabhwani5 commented 1 year ago

Uploaded csi snap here : /scale-csi/D.1012

amdabhad commented 1 year ago

This is expected due to additional REST call and handling added for RDR in case of VAR_DRIVER_DISCOVER_CG_FILESET: ENABLED and this will most likely be not fixed in CSI 2.10.

amdabhad commented 1 year ago

Along with the time taken for a PVC to go to bound state, the actual time taken for CreateVolume request by CSI driver in the logs will help to understand exact numbers here, e.g.

I0908 06:11:11.617869       1 utils.go:79] [d8df41bc-e6d4-48fc-8432-1a580c7279bb] Time taken to execute /csi.v1.Controller/CreateVolume request(in milliseconds): 27059
saurabhwani5 commented 1 year ago

In case of disabled PVC creation is talking more than 60 sec , need to check why is this taking more time I0908 07:20:47.204830 1 utils.go:79] [47ac119a-dbad-42c3-910a-6bc0001ae112] Time taken to execute /csi.v1.Controller/CreateVolume request(in milliseconds): 62375

amdabhad commented 1 year ago

DISABLED taking more time was observed by Saurabh in dev build before PR#1004, and I think with CSI 2.8.0 build also. It is likely to be due to delay from Scale/GUI/Network. Please try the combinations of Scale cluster/CSI/k8s/OCP where you are seeing less/more delays to identify if it is due to any Scale/GUI version or fyre delay.

For ENABLED case too, observe and comment the time taken for CreateVolume in logs for multiple volumes created at various time with various Scale clusters (also while capturing the time taken for CreateVolume in case of DISABLED/previous build before PR#1004 - so that the difference is seen for the same cluster).

Jainbrt commented 10 months ago

@amdabhad @saurabhwani5 is this still a valid defect ?

amdabhad commented 10 months ago

@Jainbrt when CG fileset discovery is ENABLED and if there are multiple independent filesets on the storage created by CSI, then the delay is expected due additional processing added in CSI 2.10 for RDR. So from test side, unless you want to test delays for any scenarios for DISABLED case, this can be closed (anyways similar delays were also observed on one of Saurabh's cluster with CSI 2.8.0).

Jainbrt commented 7 months ago

Based on last comment, we can close this defect