dell / csm

Dell Container Storage Modules (CSM)
Apache License 2.0
68 stars 15 forks source link

[QUESTION]: storage_csm_powerflex_v2101.yaml is not working #1450

Open edga-silva opened 2 months ago

edga-silva commented 2 months ago

How can the Team help you today?

Details: ? Hi team,

I am trying to connect my Openshift v4.15 with DELL PowerFlex v3.6.2 but I am stuck with the Container Module provide by github (storage_csm_powerflex_v2101.yaml) for some reason it is not working.

I have follow the Red Hat OpenShift 4.14 deployment on Dell PowerFlex and currently is configured:

jooseppi-luna commented 2 months ago

@edga-silva sorry to hear you're having issues! A few questions for you to help us figure this out: 1) I don't see you mention the csm-operator -- do you have the csm-operator installed? The YAML file you link to requires the csm-operator to be installed to work. Here are instructions for installing the csm-operator and instructions for installing the powerflex driver with csm-operator. 2) If you do have operator installed, can you provide the following logs/outputs?

  1. any relevant error output or logs with context
  2. output of oc get pods -n <operator namespace> && oc get pods -n <powerflex namespace>
  3. logs from the csm-operator controller: oc logs dell-csm-operator-controller-manager-XXXX-XX -n <operator namespace>
  4. logs from the driver controller pod: oc logs powerflex-controller-XXXX-XX -n powerflex
  5. logs from a driver worker pod: oc logs powerflex-node-XXXX-XX -n powerflex
  6. CSM object description: run oc get csm -A, then oc describe csm -n <NAMESPACE> <NAME> for each CSM object.
jooseppi-luna commented 2 months ago

@edga-silva looking at the tutorial you linked to, I see it is pretty old -- it is using the dell-csi-operator, which has since been replaced by the csm-operator that I linked to above. I would recommend following the docs I linked to instead of that video for installing the PowerFlex CSI driver.

edga-silva commented 2 months ago

Thanks @jooseppi-luna for the info and links.

I have followed the instructions for installing the csm-operator via Operator Hub but it seems that the documentation miss the part of how to Create and Instance for Container Storage Module that was the reason that was looking other documentation but no luck.

As you can see the CSM Operator is installed but there is not pods : [root@localhost ~]# oc get subscriptions.operators.coreos.com --all-namespaces | grep dell openshift-operators dell-csm-operator-certified dell-csm-operator-certified certified-operators stable

NMstate Operator was installed too and configured 3 interfaces per host to reach DELL PowerFlex as per Red Hat OpenShift 4.14 deployment on Dell PowerFlex. I tried to follow the step to configure the CSI drivers but it doesnt work, this document is the latest document that I have found but .
[root@localhost ~]# oc get subscriptions.operators.coreos.com --all-namespaces | grep nmstate openshift-nmstate kubernetes-nmstate-operator kubernetes-nmstate-operator redhat-operators stable

[root@localhost ~]# oc get nncp NAME STATUS REASON bond1-configurations-node1 Available SuccessfullyConfigured bond1-configurations-node2 Available SuccessfullyConfigured bond1-configurations-node3 Available SuccessfullyConfigured

[root@localhost ~]# oc get secrets -n powerflex NAME TYPE DATA AGE vxflexos-certs Opaque 0 42h vxflexos-config Opaque 1 42h vxflexos-creds Opaque 1 42h

I have tried to Create and Instance for Container Storage Module via GUI using github (storage_csm_powerflex_v2101.yaml) but it fail straightforward. I have customized the YAML script to use my "powerflex" project/namespace instead of the "vxflexos" but same error.

Hope you can help me with this or point out to the next step after installing the DELL CSM Operator via Operator HUB.

Thanks in advance.

jooseppi-luna commented 2 months ago

@edga-silva got it, so the operator is there but you're not able to install the driver. Can you provide as much of the requested logs + output from up above? The operator logs may tell us why the install is not working.

jooseppi-luna commented 2 months ago

So as much of this as you are able to gather:

  1. any relevant error output or logs with context
  2. output of oc get pods -n <operator namespace> && oc get pods -n <powerflex namespace>
  3. logs from the csm-operator controller: oc logs dell-csm-operator-controller-manager-XXXX-XX -n <operator namespace>
  4. logs from the driver controller pod: oc logs powerflex-controller-XXXX-XX -n powerflex
  5. logs from a driver worker pod: oc logs powerflex-node-XXXX-XX -n powerflex
  6. CSM object description: run oc get csm -A, then oc describe csm -n <NAMESPACE> <NAME> for each CSM object.
edga-silva commented 2 months ago

Thanks @jooseppi-luna for your help and patience. I am starting from scratch again just in case I am missing something or misunderstand the documentation and I am copying all the steps and its output.

Step 1: Following the step from instructions for installing the csm-operator. I have followed path "Manual Installation on a cluster without OLM" instead of Operator Hub.

[root@localhost ed]# git clone -b v1.6.0 https://github.com/dell/csm-operator.git Cloning into 'csm-operator'... remote: Enumerating objects: 16114, done. remote: Counting objects: 100% (3017/3017), done. remote: Compressing objects: 100% (1264/1264), done. remote: Total 16114 (delta 2033), reused 2520 (delta 1688), pack-reused 13097 (from 1) Receiving objects: 100% (16114/16114), 4.62 MiB | 0 bytes/s, done. Resolving deltas: 100% (10859/10859), done. Note: checking out 'af9597868ef08575b7427dc57f0d7aa562086477'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

[root@localhost ed]# cd csm-operator

[root@localhost csm-operator]# bash scripts/install.sh Installing Dell Container Storage Modules Operator



Checking for existing installation of Dell Container Storage Modules Operator Checking for existing installation No resources found in dell-csm-operator namespace. Success


Environment configuration Kubernetes Version: 1.28 Openshift: true


Checking for kubectl installation Success Checking for VolumeSnapshotClasses CRD Success Checking for VolumeSnapshotContents CRD Success Checking for VolumeSnapshots CRD Success Checking if snapshot controller is deployed Success


Checking if namespace exists 'dell-csm-operator' Namespace 'dell-csm-operator' doesn't exist Creating namespace 'dell-csm-operator'



Installing Operator


Install/Update CRD Warning: resource customresourcedefinitions/apexconnectivityclients..com is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only beources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically. Warning: resource customresourcedefinitions/containerstoragemodules.storage.dell.com is missing the kubectl.kubernetes.io/last-applied-configuraion which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-contl apply. The missing annotation will be patched automatically. Success


Install Operator Success

Checking deployment dell-csm-operator-controller-manager Waiting up to 300 seconds to roll out.

Waiting for deployment "dell-csm-operator-controller-manager" rollout to finish: 0 of 1 updated replicas are available... deployment "dell-csm-operator-controller-manager" successfully rolled out


Installation complete

[root@localhost csm-operator]# kubectl get pods -n dell-csm-operator NAME READY STATUS RESTARTS AGE dell-csm-operator-controller-manager-cdc6b8786-gsbvt 1/1 Running 0 15m

=============================================

The CRD Install got a warning but it seems to finish successful. However, I attached the oc logs dell-csm-operator-controller-manager.

oc_logs_dell-csm-operator-controller-manager-cdc6b8786-gsbvt.log

Step 2: Follow step from instructions for installing the powerflex driver with csm-operator

// List all Dell CSI drivers [root@localhost cdrom]# kubectl get csm --all-namespaces No resources found

// Create Namespace [root@localhost cdrom]# kubectl create namespace vxflexos namespace/vxflexos created

// Create Secret [root@localhost ed]# kubectl create secret generic vxflexos-config -n vxflexos --from-file=config=config-secret.yaml secret/vxflexos-config created

//Install Driver //Create CR for Powerflex using the sanmple files provides https://github.com/dell/csm-operator/tree/main/samples/storage_csm_powerflex_v2101.yaml

[root@localhost ed]# kubectl create -f storage_csm_powerflex_v2101.yaml containerstoragemodule.storage.dell.com/vxflexos created

// Check CSI drivers [root@localhost ed]# kubectl get csm --all-namespaces NAMESPACE NAME CREATIONTIME CSIDRIVERTYPE CONFIGVERSION STATE vxflexos vxflexos 2m29s powerflex v2.10.1 Failed

PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc get csm --all-namespaces NAMESPACE NAME CREATIONTIME CSIDRIVERTYPE CONFIGVERSION STATE vxflexos vxflexos 4m52s powerflex v2.10.1 Failed PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc get pods -n vxflexos NAME READY STATUS RESTARTS AGE vxflexos-controller-8f6459ff6-zwmt5 4/5 CrashLoopBackOff 21 (31s ago) 5m8s vxflexos-node-5xpbz 2/2 Running 4 (40s ago) 5m8s vxflexos-node-6jwzg 1/2 CrashLoopBackOff 5 (30s ago) 5m8s vxflexos-node-qz2ss 1/2 Error 6 (41s ago) 5m8s PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc logs dell-csm-operator-controller-manager-cdc6b8786-gsbvt -n dell-csm-operator > oc_logs_dell-csm-operator-controller-manager-cdc6b8786-gsbvt.log PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc logs vxflexos-controller-8f6459ff6-zwmt5 -n vxflexos Defaulted container "attacher" out of: attacher, provisioner, snapshotter, resizer, driver I0906 13:09:30.457114 1 connection.go:215] Connecting to unix:///var/run/csi/csi.sock W0906 13:09:40.457340 1 connection.go:234] Still connecting to unix:///var/run/csi/csi.sock W0906 13:10:00.458230 1 connection.go:234] Still connecting to unix:///var/run/csi/csi.sock E0906 13:10:00.458302 1 main.go:136] context deadline exceeded PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc logs vxflexos-node-5xpbz -n vxflexos Defaulted container "driver" out of: driver, registrar, sdc (init) time="2024-09-06T13:10:19Z" level=info msg="configured 312d45b42e11404a" allSystemNames= endpoint="https://pfx-gui-vm.delldemolab.local:8443" isDefault=true nasName= password="****" skipCertificateValidation=true systemID=312d45b42e11404a user=admin time="2024-09-06T13:10:19Z" level=info msg="driver configuration file " file=/vxflexos-config-params/driver-config-params.yaml time="2024-09-06T13:10:19Z" level=info msg="Read CSI_LOG_FORMAT from log configuration file" format= time="2024-09-06T13:10:19Z" level=info msg="CSI_LOG_FORMAT value not recognized, setting to text" format= time="2024-09-06T13:10:19Z" level=info msg="array configuration file" file=/vxflexos-config/config time="2024-09-06T13:10:19Z" level=info msg="set SDC GUID" guid=7649A2A3-F5D5-536B-A261-18B61D283AF0 time="2024-09-06T13:10:19Z" level=info msg="Found connected system" ID=6863f34d30f73c0f time="2024-09-06T13:10:19Z" level=info msg="Probing all arrays. Number of arrays: 0" time="2024-09-06T13:10:19Z" level=info msg="Probing all arrays. Number of arrays: 1" time="2024-09-06T13:10:26Z" level=error msg="array 312d45b42e11404a probe failed: rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF" time="2024-09-06T13:10:26Z" level=info msg="configured csi-vxflexos.dellemc.com" ExternalAccess= IsApproveSDCEnabled=false IsHealthMonitorEnabled=false IsQuotaEnabled=false IsSdcRenameEnabled=false KubeNodeName=bm-host-01.cluster-1.delldemolab.local MaxVolumesPerNode=0 allowRWOMultiPodAccess=false autoprobe=true mode=node privatedir=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks sdcGUID= sdcPrefix= thickprovision=false time="2024-09-06T13:10:26Z" level=info msg="removed sock file" path=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/csi_sock time="2024-09-06T13:10:26Z" level=fatal msg="grpc failed" error="rpc error: code = FailedPrecondition desc = All arrays are not working. Could not proceed further: map[312d45b42e11404a:rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF]" PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc logs vxflexos-node-6jwzg -n vxflexos Defaulted container "driver" out of: driver, registrar, sdc (init) time="2024-09-06T13:09:45Z" level=info msg="configured 312d45b42e11404a" allSystemNames= endpoint="https://pfx-gui-vm.delldemolab.local:8443" isDefault=true nasName= password="****" skipCertificateValidation=true systemID=312d45b42e11404a user=admin time="2024-09-06T13:09:45Z" level=info msg="driver configuration file " file=/vxflexos-config-params/driver-config-params.yaml time="2024-09-06T13:09:45Z" level=info msg="Read CSI_LOG_FORMAT from log configuration file" format= time="2024-09-06T13:09:45Z" level=info msg="CSI_LOG_FORMAT value not recognized, setting to text" format= time="2024-09-06T13:09:45Z" level=info msg="array configuration file" file=/vxflexos-config/config time="2024-09-06T13:09:45Z" level=info msg="set SDC GUID" guid=EA946D29-F0F7-515E-B8DB-F8F65BAABD71 time="2024-09-06T13:09:45Z" level=info msg="Found connected system" ID=6863f34d30f73c0f time="2024-09-06T13:09:45Z" level=info msg="Probing all arrays. Number of arrays: 0" time="2024-09-06T13:09:45Z" level=info msg="Probing all arrays. Number of arrays: 1" time="2024-09-06T13:10:00Z" level=error msg="array 312d45b42e11404a probe failed: rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF" time="2024-09-06T13:10:00Z" level=info msg="configured csi-vxflexos.dellemc.com" ExternalAccess= IsApproveSDCEnabled=false IsHealthMonitorEnabled=false IsQuotaEnabled=false IsSdcRenameEnabled=false KubeNodeName=bm-host-02.cluster-1.delldemolab.local MaxVolumesPerNode=0 allowRWOMultiPodAccess=false autoprobe=true mode=node privatedir=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks sdcGUID= sdcPrefix= thickprovision=false time="2024-09-06T13:10:00Z" level=info msg="removed sock file" path=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/csi_sock time="2024-09-06T13:10:00Z" level=fatal msg="grpc failed" error="rpc error: code = FailedPrecondition desc = All arrays are not working. Could not proceed further: map[312d45b42e11404a:rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF]" PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc logs vxflexos-node-qz2ss -n vxflexos Defaulted container "driver" out of: driver, registrar, sdc (init) time="2024-09-06T13:12:12Z" level=info msg="configured 312d45b42e11404a" allSystemNames= endpoint="https://pfx-gui-vm.delldemolab.local:8443" isDefault=true nasName= password="****" skipCertificateValidation=true systemID=312d45b42e11404a user=admin time="2024-09-06T13:12:12Z" level=info msg="driver configuration file " file=/vxflexos-config-params/driver-config-params.yaml time="2024-09-06T13:12:12Z" level=info msg="Read CSI_LOG_FORMAT from log configuration file" format= time="2024-09-06T13:12:12Z" level=info msg="CSI_LOG_FORMAT value not recognized, setting to text" format= time="2024-09-06T13:12:12Z" level=info msg="array configuration file" file=/vxflexos-config/config time="2024-09-06T13:12:12Z" level=info msg="set SDC GUID" guid=0557156D-1E1A-558A-845E-435F664D7884 time="2024-09-06T13:12:12Z" level=info msg="Found connected system" ID=6863f34d30f73c0f time="2024-09-06T13:12:12Z" level=info msg="Probing all arrays. Number of arrays: 0" time="2024-09-06T13:12:12Z" level=info msg="Probing all arrays. Number of arrays: 1" time="2024-09-06T13:12:12Z" level=error msg="array 312d45b42e11404a probe failed: rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF" time="2024-09-06T13:12:12Z" level=info msg="configured csi-vxflexos.dellemc.com" ExternalAccess= IsApproveSDCEnabled=false IsHealthMonitorEnabled=false IsQuotaEnabled=false IsSdcRenameEnabled=false KubeNodeName=bm-host-03.cluster-1.delldemolab.local MaxVolumesPerNode=0 allowRWOMultiPodAccess=false autoprobe=true mode=node privatedir=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks sdcGUID= sdcPrefix= thickprovision=false time="2024-09-06T13:12:12Z" level=info msg="removed sock file" path=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/csi_sock time="2024-09-06T13:12:12Z" level=fatal msg="grpc failed" error="rpc error: code = FailedPrecondition desc = All arrays are not working. Could not proceed further: map[312d45b42e11404a:rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF]" PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc get csm -A NAMESPACE NAME CREATIONTIME CSIDRIVERTYPE CONFIGVERSION STATE vxflexos vxflexos 9m29s powerflex v2.10.1 Failed PS C:\Users\edgardo\Desktop\openshift\cluster-1> oc describe csm -n vxflexos > oc-describe-csm.log

oc_logs_dell-csm-operator-controller-manager-cdc6b8786-gsbvt.log oc-describe-csm.log

I have attached some files from the output command.

Thanks again and let me know if you need more details.

Kind Regards, Ed

edga-silva commented 2 months ago

Add some screenshot from DELL Powerflex as the CSI YAML has created the 3 SCDs but for some reason it is not using correctly the IP from Control Plane Node1 (10.18.X.108). Every node has 3 IPs:

Manual CSM Operator Install.docx

Let me know if you need more details.

Kind Regards, Ed

jooseppi-luna commented 2 months ago

@edga-silva thanks so much for all the logs! Very helpful -- to me it looks like you might have an array issue, according to these logs the driver is not able to connect to the array:

time="2024-09-06T13:12:12Z" level=info msg="configured 312d45b42e11404a" allSystemNames= endpoint="https://pfx-gui-vm.delldemolab.local:8443/" isDefault=true nasName= password="********" skipCertificateValidation=true systemID=312d45b42e11404a user=admin
time="2024-09-06T13:12:12Z" level=info msg="driver configuration file " file=/vxflexos-config-params/driver-config-params.yaml
time="2024-09-06T13:12:12Z" level=info msg="Read CSI_LOG_FORMAT from log configuration file" format=
time="2024-09-06T13:12:12Z" level=info msg="CSI_LOG_FORMAT value not recognized, setting to text" format=
time="2024-09-06T13:12:12Z" level=info msg="array configuration file" file=/vxflexos-config/config
time="2024-09-06T13:12:12Z" level=info msg="set SDC GUID" guid=0557156D-1E1A-558A-845E-435F664D7884
time="2024-09-06T13:12:12Z" level=info msg="Found connected system" ID=6863f34d30f73c0f
time="2024-09-06T13:12:12Z" level=info msg="Probing all arrays. Number of arrays: 0"
time="2024-09-06T13:12:12Z" level=info msg="Probing all arrays. Number of arrays: 1"
time="2024-09-06T13:12:12Z" level=error msg="array 312d45b42e11404a probe failed: rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF"
time="2024-09-06T13:12:12Z" level=info msg="configured csi-vxflexos.dellemc.com" ExternalAccess= IsApproveSDCEnabled=false IsHealthMonitorEnabled=false IsQuotaEnabled=false IsSdcRenameEnabled=false KubeNodeName=bm-host-03.cluster-1.delldemolab.local MaxVolumesPerNode=0 allowRWOMultiPodAccess=false autoprobe=true mode=node privatedir=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/disks sdcGUID= sdcPrefix= thickprovision=false
time="2024-09-06T13:12:12Z" level=info msg="removed sock file" path=/var/lib/kubelet/plugins/vxflexos.emc.dell.com/csi_sock
time="2024-09-06T13:12:12Z" level=fatal msg="grpc failed" error="rpc error: code = FailedPrecondition desc = All arrays are not working. Could not proceed further: map[312d45b42e11404a:rpc error: code = FailedPrecondition desc = unable to login to VxFlexOS Gateway: EOF]"

To test your connection, you can run this command from your master node: curl -k https://<PFLEX_USERNAME>:<PFLEX_PASSWORD>@<PFLEX_GATEWAY>/api/login and it should return a base64-encoded response, like this:

[root@master-1 ~]#  curl -k https://user:password@10.255.255.255/api/login
"xZgULQULYFfjKGDCBLQpHgUTTrnhHDgIocAdwLYukdbQqfcvTryfPbHQwkaThEpwUFTXbY"[root@master-1 ~]#

If you receive an error instead, then something is wrong with your array connection. Here are a few example errors:

# Bad password
[root@master-1 ~]#  curl -k https://user:BADPASS@10.255.255.255/api/login
{"message":"Unauthorized","httpStatusCode":401,"errorCode":0}[root@master-1 ~]#
# Nonexistent array
[root@master-1 ~]#  curl -k https://user:password@10.25.25.25/api/login
curl: (7) Failed to connect to 10.25.25.25 port 443: No route to host
[root@master-1 ~]#
edga-silva commented 2 months ago

@jooseppi-luna , I know the logs suggested that the problem is the secret credentials but it is not the case as you can see from the document attached in my previous post. The YAML connected to DELL PowerFlex and created 3 SDCs but for some reason it is not configuring the IPs correctly for the Node1. Please, have a look to the document attached and let me know if you still think the problem is the password. Bear in mind, that I am using the same user and password to log to PowerFlex.

Manual CSM Operator Install.docx

jooseppi-luna commented 1 month ago

@edga-silva -- it might be easiest to continue looking at this on a call, if that works for you. Feel free to email me at jooseppi_luna@dell.com to let me know some times that you are free, and I can set up the meeting.

edga-silva commented 1 month ago

Thanks @jooseppi-luna. I will send you an email soon.

AronAtDell commented 1 day ago

@jooseppi-luna , @edga-silva - Has this question been resolved? Thanks!