[QUESTION]: Dell Powerscale CSI - Can't mount multiple static PVs into one pod

wrender commented 11 months ago

How can the Team help you today?

Can't mount multiple static PVs into one pod

Details: ?

I have a strange requirement, to have several PVs, pointed to each IP on a dell powerscale. We want the software to load balance to the different IPs by using different mount points in the pod and not use smart connect. So for example, we have one POD running on a single node, and that is connecting to the Dell Powerscale via CSI. When I create the PVs and configure them to use the CSI driver, and mount multiple NFS exports into a single POD the pod fails to start. If I mount just a single PV using CSI into the pod it starts fine and mounts it correctly. If I don't use CSI, and create the PVs just as a regular Kubernetes NFS Export, I can mount all 9 mounts into the singe POD fine. Here is my configuration that fails:

apiVersion: v1
kind: Pod
metadata:
  name: static-prov-pod
  namespace: default
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
  containers:
    - name: test
      image: harbor.domain.local/library/rocky-test:v1
      command: [ "/bin/sleep", "3600" ]
      volumeMounts:
        - mountPath: "/data0"
          name: pvol
        - mountPath: "/data1"
          name: pvol1
      securityContext:
        allowPrivilegeEscalation: false
        runAsNonRoot: true
        capabilities:
          drop: ["ALL"]
        seccompProfile:
          type: RuntimeDefault
  volumes:
    - name: pvol
      persistentVolumeClaim:
        claimName: static-nfs-pvc-10.0.1.21
    - name: pvol1
      persistentVolumeClaim:
        claimName: static-nfs-pvc-10.0.1.22
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: static-nfs-pv-10.0.1.21
spec:
  accessModes:
    - "ReadOnlyMany"
  capacity:
    storage: "10000Mi"
  persistentVolumeReclaimPolicy: Retain
  storageClassName: "isilon-nfsv3"
  csi:
    driver: csi-isilon.dellemc.com
    volumeAttributes:
        Path: "/ifs/data/data"
        Name: "data"
        AzServiceIP: 10.0.1.21
    volumeHandle: data=_=_=43=_=_=System=_=_=Cluster
  claimRef:
    name: static-nfs-pvc-10.0.1.21
    namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: static-nfs-pvc-10.0.1.21
  namespace: default
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 10000Mi
  volumeName: static-nfs-pv-10.0.1.21
  storageClassName: isilon-nfsv3
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: static-nfs-pv-10.0.1.22
spec:
  accessModes:
    - "ReadOnlyMany"
  capacity:
    storage: "10000Mi"
  persistentVolumeReclaimPolicy: Retain
  storageClassName: "isilon-nfsv3"
  csi:
    driver: csi-isilon.dellemc.com
    volumeAttributes:
        Path: "/ifs/data/data"
        Name: "data"
        AzServiceIP: 10.0.1.22
    volumeHandle: data=_=_=43=_=_=System=_=_=Cluster
  claimRef:
    name: static-nfs-pvc-10.0.1.22
    namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: static-nfs-pvc-10.0.1.22
  namespace: default
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 10000Mi
  volumeName: static-nfs-pv-10.0.1.22
  storageClassName: isilon-nfsv3

shanmydell commented 11 months ago

@cbartoszDell : Please look into the query above

wrender commented 10 months ago

Any help appreciated. 😀

shanmydell commented 10 months ago

@wrender : This query will be addressed shortly.

adarsh-dell commented 10 months ago

Hello @wrender , thank you for bringing this to our notice. Apologies for the delayed response. Could you kindly provide answers to the following questions to assist us in resolving your query?

How many arrays/clusters are configured in the secret.yaml file?
Does the storage class 'isilon-nfsv3' have ClusterName and AzServiceIP parameters? Please share the SC.yaml.
Have you created the PV/Volume directly over the Array using the GUI, or have you created the PV earlier via dynamic provisioning and then tried to consume it via static provisioning?
Regarding the statement "If I mount just a single PV using CSI into the pod, it starts fine and mounts it correctly," have you performed the same operation with both PVs? If yes, did it work for both PVs when mounted alone to the same pod one by one?
Can you please share the logs of the node & controller pod's driver container?
Could you share the cluster name visible in the volumeHandle: data=_=_=43=_=_=System=_=_=Cluster of the PV?
Have you ever tried to mount multiple NFS volumes/exports to the same pod using dynamic provisioning, and did it work? If not, could you please try and share the logs to help us analyze the issue more quickly?

wrender commented 10 months ago

Hi @adarsh-dell

We have two different clusters defined in secrets.yaml
The storage class has ClusterName, but it does not have AzServiceIP parameter set. I'm only setting that on the PVs.
The Volume was initially created using the OneFS UI.
Yes. I tested this. If I uncomment the first PV, and mount the second PV into the pod it works. I just can't do both simultaneously.
I can't really share the logs, but I do notice, that the driver container logs on the node only show the name of one PV (even though the pod is requesting two), and the controller driver logs show it requesting 2 different mounts from the API. I don't see any errors.
I can't share the logs, but the cluster name is just a single word just: volumeHandle: data=_=_=43=_=_=System=_=_=Ourcluster
I can test this and get back to you.

adarsh-dell commented 10 months ago

Thank you, @wrender, for your response.

While analyzing the issue without logs may present challenges in identifying the root cause, we will attempt to reproduce it in our environment. In the meantime, as you explore our 7th query, could you please confirm whether both PVs are being created on two different arrays/clusters or on the same one?

Thanks Adarsh

adarsh-dell commented 10 months ago

@wrender

If both PVs are on different arrays, I think that only one clusterName will be present in the StorageClass (SC), as both PVCs are referencing the same SC.
While this might not be the cause, it does appear unusual.
I believe that the volumeHandle attribute, specifically data=_=_=43=_=_=System=_=_=**_Ourcluster_**, indicates a different OurCluster in both PVs, matching the clusterName configured in the secret.yaml.
Please attempt to mount two PVs from the same array/cluster and observe if this operation is successful in your scenario.

wrender commented 10 months ago

Just to clarify, only one clusterName is present in our StorageClass (SC). We have a separate storageClass for each cluster. The tests I am running are trying to mount two PVs from the same cluster into one pod. The Cluster name in the volumeHandle matches one of the names of the clusters we have defined in the secrets.yaml file.

adarsh-dell commented 10 months ago

Hi @wrender,

Have you had an opportunity to test dynamic provisioning for mounting multiple volumes?
Just for your information, we attempted to reproduce the same scenario and kept nearly everything identical to yours, _except for AzServiceIP._ It was set to the exact IP of the cluster/array due to certain constraints. Interestingly, the driver successfully mounted both volumes, and the pod transitioned into the running state. However, during our initial attempt with only one PV, it did take some time, but within less than a minute, it moved into the running state.
Have you ever attempted to mount multiple volumes while keeping the AzServiceIP the same as the IP of the cluster/array? If so, what was the behavior of the CSI driver? Was the mounting operation successful, and did the pod transition into the running state?

wrender commented 9 months ago

hello @adarsh-dell . I have not yet had a chance to test dynamic provisioning for multiple volumes. I will try and test that soon and get back to you. I am just curious what OS and Kubernetes version are you using? I am using Ubuntu 20.04.x, and Kubernetes 1.26.9 provisioned with RKE2 from Rancher. I did just quickly test using the same AZServiceIP IP for both PVs which point to the same node on a powerscale cluster and it also fails to mount. The same issue occurs. Here is the error I always get: Warning FailedMount 51s (x2 over 3m8s) kubelet Unable to attach or mount volumes: unmounted volumes=[pv-storage-01], unattached volumes=[pv-storage-01 pv-storage-02 kube-api-access-dsg4n]: timed out waiting for the condition

adarsh-dell commented 9 months ago

Thanks @wrender .

Here are the details of the OS and CO that we are currently utilizing.

While I don't have access to your setup logs, it appears that Kubernetes might not be initiating requests to the CSI-Driver. It's not necessarily a failure on the driver's part in performing or executing the mount operation. To troubleshoot, please examine the logs of the attacher sidecar in the controller pod. Cross-check if the controllerPublish request is present for both Persistent Volumes (PVs) to identify any potential issues.
Following your previous comment, if the controllerPublish request is confirmed to be present for both Persistent Volumes (PVs), kindly inspect the logs within the driver container on the node pod's end for the NodePublishVolume call.

wrender commented 9 months ago

Hi @adarsh-dell I took a look at the attacher sidecar in the controller pod like you asked. It appears for some reason that there is only one controllerPublish entry for one PV. Even though my pod is specifying two PVs.

adarsh-dell commented 9 months ago

Hi @wrender , Unexpectedly, the driver (which is a container and part of the controller pod) responsibility comes later in this scenario. Since I cannot access the logs, I suspect there might be an issue with the container orchestrator itself. Now, I suggest attempting dynamic provisioning as previously mentioned. Have you had a chance to try it out? Following that, you can test the same scenarios in your lab with different container orchestrators and operating systems. Ensure that you utilize the most recent driver version that incorporates the latest sidecars.

shanmydell commented 9 months ago

@wrender : Any updates after incorporating the suggestions from @adarsh-dell

wrender commented 9 months ago

@shanmydell and @adarsh-dell.

I can confirm. If I create a PVC/PV using dynamic provisioning, and then do a volumemount into my pod with those 2 different dynamic PVCs, it mounts both properly.

If I create a static CSI NFS Export, and manually add the PVs and PVCs following the instructions here: https://dell.github.io/csm-docs/docs/csidriver/features/powerscale/#consuming-existing-volumes-with-static-provisioning Trying to mount into one pod, the same export, using two different AzServiceIP destinations (to target two different nodes in the cluster) it fails.

If I create a static NFS export following the instructions here: https://dell.github.io/csm-docs/docs/csidriver/features/powerscale/#consuming-existing-volumes-with-static-provisioning and mount only one PV/PVC path into the single pod it works.

If I create static PVs without using the CSI driver, and target two different powerscale nodes, with the same mount, and mount them into one pod it works fine.

Unfortunately at the moment I don't have the resources to test a different OS, or container runtime other than RKE2 and Containerd.

shanmydell commented 9 months ago

@wrender : Please confirm the version of the driver @adarsh-dell : Do we support the above usecase?

wrender commented 9 months ago

Ubuntu 20.04.x, and Kubernetes 1.26.9 provisioned with RKE2 from Rancher. CSI powerscale Release v2.8.0

adarsh-dell commented 9 months ago

Hi @wrender , I am checking it with @nitesh3108 and will confirm you once there is some update.

Thanks

wrender commented 9 months ago

@adarsh-dell . I'm not sure if this could be related, our driver is set to use quotas. But this particular static mount, that we are trying to target different AzServiceIP with does not have a quota set on it. These two errors do show up in the logs:

level=info msg="/csi.v1.Controller/ControllerExpandVolume: REP 0054: rpc error: code = NotFound desc = failed to get quota: No quota set on the volume 'data1'"

GRPC error: rpc error: code = NotFound desc = failed to get quota: No quota set on the volume 'data1'

Do you think it's possible not having a CSI_QUOTA_ID may be causing this multiple mount to one pod to fail? I am also wondering, is it possible to disable the quota checking for certain CSI PVs/PVCs?

adarsh-dell commented 9 months ago

Hi @nitesh3108 Any help here?

shanmydell commented 3 months ago

@nitesh3108 @adarsh-dell @shefali-malhotra : Please drive this to closure

dell / csm

[QUESTION]: Dell Powerscale CSI - Can't mount multiple static PVs into one pod #1052

How can the Team help you today?