Closed LeoHu1985 closed 2 years ago
@spriya-m can you check this ?
I can share the environment access thru zoom if needed, very easy to recreate the issue.
Hi @LeoHu1985 We have verified this issue in our environment and application pods are going to a running state. Can you please share the driver node pod logs on which that application is scheduled? It will help us to debug further. Thank you
Thanks @satyakonduri for your kind reply
Its interesting, csi driver pod went into crash loop after deploying application using powerstore storage class.
k8s-0-b-1:/home/lab/applications/yugabyte # kubectl get pods -n csi-powerstore NAME READY STATUS RESTARTS AGE powerstore-controller-868c59b4ff-fxhc8 7/7 Running 0 8d powerstore-controller-868c59b4ff-pnldl 7/7 Running 0 8d powerstore-node-dfnnf 1/2 CrashLoopBackOff 8 (3m1s ago) 8d powerstore-node-jk8l2 1/2 CrashLoopBackOff 8 (3m6s ago) 8d
I have attached pod log for csi driver pod powerstore-node-dfnnf : "csi-pod-log" csi-pod-log.txt
Also attached pod description for application pod : "app-pod-des" app-pod-des.txt
Hi @LeoHu1985,
Thank you for providing the logs.
Can you please share the output for this nvme list-subsys -o json
command from the nodes where you are seeing the crash.
Thank You.
@satyakonduri Here you go, output from all 3 nodes in the k8s cluster
k8s-0-b-1:/home/lab/applications/yugabyte # nvme list-subsys -o json { "Subsystems" : [ { "Name" : "nvme-subsys0", "NQN" : "nqn.2014-08.org.nvmexpress:uuid:526fee38-9395-d874-f1c9-1d1df9960db1", "Paths" : [ { "Name" : "nvme0", "Transport" : "pcie", "Address" : "0000:13:00.0", "State" : "live" } ] }, { "Name" : "nvme-subsys1", "NQN" : "nqn.1988-11.com.dell:powerstore:00:24eff00c8688DFE57BE0", "Paths" : [ { "Name" : "nvme1", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" }, { "Name" : "nvme2", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" } ] } ] }
k8s-0-b-2:~ # nvme list-subsys -o json { "Subsystems" : [ { "Name" : "nvme-subsys0", "NQN" : "x.x.x.x:uuid:526d57eb-b599-669b-3de3-9a045b5aae98", "Paths" : [ { "Name" : "nvme0", "Transport" : "pcie", "Address" : "0000:13:00.0", "State" : "live" } ] }, { "Name" : "nvme-subsys1", "NQN" : "nqn.1988-11.com.dell:powerstore:00:24eff00c8688DFE57BE0", "Paths" : [ { "Name" : "nvme1", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" }, { "Name" : "nvme2", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" } ] } ] }
k8s-0-b-3:~ # nvme list-subsys -o json { "Subsystems" : [ { "Name" : "nvme-subsys0", "NQN" : "x.x.x.x:uuid:5229a405-dfd5-775b-0c75-c2a46dda3 624", "Paths" : [ { "Name" : "nvme0", "Transport" : "pcie", "Address" : "0000:13:00.0", "State" : "live" } ] }, { "Name" : "nvme-subsys1", "NQN" : "x.x.x.x:powerstore:00:24eff00c8688DFE57BE0", "Paths" : [ { "Name" : "nvme1", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" }, { "Name" : "nvme2", "Transport" : "tcp", "Address" : "traddr=x.x.x.x trsvcid=4420", "State" : "live" } ] } ] }
Again if you need to hop on my live environment to further check, feel free to let me know, I can do a zoom share.
Hi @LeoHu1985. We'd be glad to work with further on this. Have you joined our Slack group? If not, please do so here: https://app.smartsheet.com/b/form/e99b4d2da13e42518df4d3307c010f47. We can then work directly with you there to further troubleshoot this issue. Thanks.
@gallacher thanks, I have just submitted Slack group access request form, looking forward to working with you guys directly
Hi @LeoHu1985, my apologies but we are experiencing a slight delay in processing our Slack access requests.
@LeoHu1985, please send an email to karavi@dell.com to work with someone directly on this. Thanks!
PRs with fix for this issue have been merged. Please use csi-powerstore nightly image from docker hub.
How can the Team help you today?
PowerStore CSI driver NVME TCP connectivity, attach volume successful, mount failed with error:
Warning FailedMount 8s (x8 over 72s) kubelet MountVolume.MountDevice failed for volume "csivol-a3491527fc" : rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/csi-powerstore.dellemc.com/csi_sock: connect: connection refused"
Details: ? SLES 15 SP3 VM running on ESXi 7.0.3 (All NVME TCP enabled, NVME module installed) K8S 1.24.6 (3 VMs cluster) PowerStore CSI driver 2.4
CSI driver installed successfully with NVME TCP topology, CSI driver successfully added nvme hosts into PowerStore with NVMe TCP initiator NQNs, and when creating pvcs and applications, CSI was able to automatically create volumes on powerstore and map them to NVMEe hosts in K8S cluster, k8s hosts successfully showed those nvme volume in nvme list/fdisk -l output, k8s also showed pvcs created and bound successfully, volumes attached to pods successfully,
however application pods stuck on container-creating phase due to failed mount volumes, mount failed with error:
Warning FailedMount 8s (x8 over 72s) kubelet MountVolume.MountDevice failed for volume "csivol-a3491527fc" : rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/csi-powerstore.dellemc.com/csi_sock: connect: connection refused"
Also I have tried to manually mount a test volume from the same powerstore onto the k8s VM's linux OS via NVME TCP, mkfs.ext4 and mount all completed successfully.
Any advice to fix this would be appreciated.