dell / csm

Dell Container Storage Modules (CSM)
Apache License 2.0
71 stars 15 forks source link

[QUESTION]: Post Upgrade of Dell CSI Operator on Openshift from 1.8 to 1.10, its not working anymore #647

Closed soudamsugit closed 1 year ago

soudamsugit commented 1 year ago

How can the Team help you today?

Details: ?

Post Upgrade of Dell CSI Operator on Openshift from 1.8 to 1.10, its not working anymore and below are the errors from the pod logs. Can you please advice on the upgrade path?

There seems to be new configs introduced, and this has broken the operator

W0207 05:26:51.893435 1 feature_gate.go:237] Setting GA feature gate Topology=true. It will be removed in a future release. I0207 05:26:51.893502 1 feature_gate.go:245] feature gates: &{map[Topology:true]} I0207 05:26:51.893523 1 csi-provisioner.go:150] Version: v3.3.0 I0207 05:26:51.893530 1 csi-provisioner.go:173] Building kube configs for running in cluster... I0207 05:26:51.894296 1 connection.go:154] Connecting to unix:///var/run/csi/csi.sock W0207 05:27:01.894795 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:11.894500 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:21.894595 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:31.894767 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:41.895241 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:51.894623 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:01.894734 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:11.894726 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:21.895019 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:31.894834 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:41.900501 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:51.894738 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock E0207 05:28:55.581158 1 connection.go:132] Lost connection to unix:///var/run/csi/csi.sock. F0207 05:28:55.581253 1 connection.go:87] Lost connection to CSI driver, exiting I0207 05:26:52.646149 1 main.go:94] Version: v4.0.0 I0207 05:26:52.647924 1 connection.go:154] Connecting to unix:///var/run/csi/csi.sock W0207 05:27:02.648876 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:12.648497 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:22.648572 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:32.648326 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:42.648893 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:52.648375 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:02.648705 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:12.648568 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:22.648351 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:32.648074 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:42.655753 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:52.659526 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock E0207 05:28:55.326594 1 connection.go:132] Lost connection to unix:///var/run/csi/csi.sock. F0207 05:28:55.326693 1 connection.go:87] Lost connection to CSI driver, exiting I0207 05:26:53.281276 1 main.go:104] Version: v6.1.0 I0207 05:26:53.283238 1 connection.go:154] Connecting to unix:///var/run/csi/csi.sock W0207 05:27:03.284144 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:13.283802 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:23.284330 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:33.283431 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:43.284224 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:53.283390 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:03.283487 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:13.283914 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:23.283488 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:33.283333 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:43.290502 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:53.292500 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock E0207 05:28:55.718412 1 connection.go:132] Lost connection to unix:///var/run/csi/csi.sock. F0207 05:28:55.718525 1 connection.go:87] Lost connection to CSI driver, exiting I0207 05:26:53.964398 1 main.go:93] Version : v1.6.0 I0207 05:26:53.964434 1 feature_gate.go:245] feature gates: &{map[]} I0207 05:26:53.967169 1 connection.go:154] Connecting to unix:///var/run/csi/csi.sock W0207 05:27:03.967793 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:13.967656 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:23.967490 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:33.967903 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:43.967257 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:27:53.968121 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:03.967686 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:13.967682 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:23.967283 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:33.967509 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:43.976492 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock W0207 05:28:53.967290 1 connection.go:173] Still connecting to unix:///var/run/csi/csi.sock E0207 05:28:54.977596 1 connection.go:132] Lost connection to unix:///var/run/csi/csi.sock. F0207 05:28:54.977683 1 connection.go:87] Lost connection to CSI driver, exiting

soudamsugit commented 1 year ago

Hello Team,

This is a Certified Operator that we are using, can you pls let us know how to reach you over call? This is a blocker for us

rensyct commented 1 year ago

@soudamsugit , we work in IST timezone and we can get into a call after half an hour which is 12:00 PM IST. Please let us know if that works for you

soudamsugit commented 1 year ago

@rensyct, 12 PM IST will be great. Also, pls let me know if you will be able to join the teams call and I can paste the link

rensyct commented 1 year ago

Yes @soudamsugit, we will be able to join the teams call

soudamsugit commented 1 year ago

@rensyct https://teams.microsoft.com/l/meetup-join/19%3ameeting_NjM0Y2UxODMtNDg1MC00ZTExLTkyZWEtYzg3Njg3NGNkMjg4%40thread.v2/0?context=%7b%22Tid%22%3a%221f4f7eda-6e51-425e-a0f9-4c2fcef58a52%22%2c%22Oid%22%3a%2263a774a0-2e52-48bd-b0db-edde453cc553%22%7d

soudamsugit commented 1 year ago

Hello @rensyct,

Thanks for yourtime yesterday. I had ran this with storage team, the credentails look fine.

Also, I have tested them from inside pod and they seem to work fine

[soudamv@qocpa001l ~/powermax/2.3] $ oc rsh powermax-controller-5ccddb4b7-mn2hf

sh-4.4$ curl -kv -u username:password https://10.204.214.25:8443/univmax/restapi/version

sh-4.4$ sh-4.4$ curl -kv -u username:password https://10.204.214.25:8443/univmax/restapi/91/system/symmetrix/000497600159

soudamsugit commented 1 year ago

@rensyct, I believe it is something with configs that is missing. Can we please connect at 10 AM IST today?

rensyct commented 1 year ago

Hi @soudamsugit, thank you for your response, Since the operator part is working as expected now and the current issue is related to PowerMax, we will need to have the Powermax team on the call. I will reach out to them and let you know the time that works for the PowerMax team

soudamsugit commented 1 year ago

Thanks a lot @rensyct. Sorry for chasing, appreciate if you can prioritise this issue

rensyct commented 1 year ago

Np @soudamsugit. I can understand. This issue is a priority for us. Will keep you posted on the time when we can connect today

rensyct commented 1 year ago

@soudamsugit, team is available at 11:00 AM IST. Does that time work for you?

soudamsugit commented 1 year ago

sure, we can have a chat at 11 AM

soudamsugit commented 1 year ago

https://teams.microsoft.com/l/meetup-join/19%3ameeting_NjM0Y2UxODMtNDg1MC00ZTExLTkyZWEtYzg3Njg3NGNkMjg4%40thread.v2/0?context=%7b%22Tid%22%3a%221f4f7eda-6e51-425e-a0f9-4c2fcef58a52%22%2c%22Oid%22%3a%2263a774a0-2e52-48bd-b0db-edde453cc553%22%7d

prablr79 commented 1 year ago

@rensyct any update on this case ?

soudamsugit commented 1 year ago

@rensyct , sorry for bugging. I still see few pods crashloopbacking

[soudamv@qocpa001l ~/powermax/2.3] $ oc get pods NAME READY STATUS RESTARTS AGE powermax-controller-b8f87cdb6-jnmv5 5/5 Running 0 21m powermax-controller-b8f87cdb6-qq5tf 5/5 Running 0 21m powermax-node-2gc4d 2/2 Running 0 21m powermax-node-6vn8f 1/2 CrashLoopBackOff 8 (2m7s ago) 21m powermax-node-97b8j 1/2 CrashLoopBackOff 8 (77s ago) 21m powermax-node-fldjp 1/2 CrashLoopBackOff 8 (2m31s ago) 21m powermax-node-g4krm 2/2 Running 0 21m powermax-node-j9n86 2/2 Running 0 21m powermax-node-nr6n5 1/2 CrashLoopBackOff 8 (4m9s ago) 21m powermax-node-nwlzk 2/2 Running 0 21m powermax-node-qxt4v 1/2 CrashLoopBackOff 8 (71s ago) 21m powermax-node-r6rl7 1/2 CrashLoopBackOff 8 (95s ago) 21m powermax-node-x9h8l 2/2 Running 0 21m powermax-node-xjsw5 2/2 Running 0 21m powermax-node-zblfk 1/2 CrashLoopBackOff 8 (2m42s ago) 21m powermax-node-zhcjb 1/2 CrashLoopBackOff 8 (3m32s ago) 21m

error:

{"level":"error","msg":"No topology keys could be generated","time":"2023-02-08T07:19:51.188623463Z"} {"level":"info","msg":"/csi.v1.Node/NodeGetInfo: REP 0016: rpc error: code = FailedPrecondition desc = no topology keys could be generate","time":"2023-02-08T07:19:51.188642894Z"} {"level":"info","msg":"/csi.v1.Identity/GetPluginInfo: REQ 0017: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0","time":"2023-02-08T07:24:57.360533138Z"} {"level":"info","msg":"/csi.v1.Identity/GetPluginInfo: REP 0017: Name=csi-powermax.dellemc.com, VendorVersion=2.3.0, Manifest=map[commit:9ee4c6afaedb58725f6518239fc527e076c3e0a9 formed:Tue, 14 Jun 202 {"level":"info","msg":"/csi.v1.Node/NodeGetInfo: REQ 0018: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0","time":"2023-02-08T07:24:59.336730277Z"} {"level":"error","msg":"Couldn't find any ip interfaces on any of the port-groups","time":"2023-02-08T07:24:59.336778092Z"} {"level":"error","msg":"No topology keys could be generated","time":"2023-02-08T07:24:59.336785694Z"} {"level":"info","msg":"/csi.v1.Node/NodeGetInfo: REP 0018: rpc error: code = FailedPrecondition desc = no topology keys could be generate","time":"2023-02-08T07:24:59.336800696Z"}

boyamurthy commented 1 year ago

hi @soudamsugit , the nodes which are failing would be having FC connection issues hence it is not creating any topology keys. could you please check FC/iscsi connection for nodes .

soudamsugit commented 1 year ago

Thanks @boyamurthy, Can you pls let me know if any documentation on checking the FC/iscsi connection for nodes

boyamurthy commented 1 year ago

hi @soudamsugit , please find the documentation for prerequisites . https://dell.github.io/csm-docs/docs/csidriver/installation/helm/powermax/#prerequisites

soudamsugit commented 1 year ago

Hello @boyamurthy

I had verified all the node having issues as per below documentation, and all seem to be looking fine but still getting errors

Set up the iSCSI initiators as follows:

soudamsugit commented 1 year ago

sh-4.4# rpm -qa | grep iscsi iscsi-initiator-utils-iscsiuio-6.2.1.2-1.gita8fcb37.el8.x86_64 iscsi-initiator-utils-6.2.1.2-1.gita8fcb37.el8.x86_64 sh-4.4# storage sh: storage: command not found sh-4.4# cat /etc/iscsi/initiatorname.iscsi; InitiatorName=iqn.1994-05.com.redhat:cfed26c7f7aa sh-4.4# curl -v telnet://10.204.xxx.xx:8443

rensyct commented 1 year ago

@soudamsugit , could we connect over a call so that the Powermax driver team can take a look at the environment to check the issue

soudamsugit commented 1 year ago

Thanks @rensyct @delldubey

Can we please connect tomorrow 9 AM IST on this issue?

rensyct commented 1 year ago

Hi @soudamsugit, We can connect at 10:30 AM tomorrow. Please let us know if that time works for you

soudamsugit commented 1 year ago

Thanks @rensyct, 10:30 AM works fine

soudamsugit commented 1 year ago

is the invite. Can you guys pls join https://teams.microsoft.com/l/meetup-join/19%3ameeting_NjM0Y2UxODMtNDg1MC00ZTExLTkyZWEtYzg3Njg3NGNkMjg4%40thread.v2/0?context=%7b%22Tid%22%3a%221f4f7eda-6e51-425e-a0f9-4c2fcef58a52%22%2c%22Oid%22%3a%2263a774a0-2e52-48bd-b0db-edde453cc553%22%7d

rensyct commented 1 year ago

Sure @soudamsugit

rensyct commented 1 year ago

Hi @soudamsugit

Posting the updates after today's call @boyamurthy figured out that FC HBA's(WWN's ) are not having connectivity/zoning to the respective storage array . Hence it is failing to create initiator group and topology keys on the nodes where driver is in crashloopback state. We dont see a failure on the nodes where the zoning is done properly

Requested @soudamsugit to reach out to the storage team from their end and ensure that the zoning is done properly between the array and the host.

boyamurthy commented 1 year ago

hi @soudamsugit , if it is working fine, could you close this .

soudamsugit commented 1 year ago

Hello @rensyct, Thank you for the support you provided, I am closing this case. Appreciate the support you guys project

Regards, Suresh

rensyct commented 1 year ago

Thank you @soudamsugit for the confirmation and for closing the case. Thank you @boyamurthy and @delldubey for helping with the queries and looking into the configuration issues related to Powermax driver.