NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller #752

Closed eselvam closed 2 years ago

eselvam commented 2 years ago

Describe the bug We have openshift cluster. We try to install trident. After the installation, we do see only 1 pod is up for Csi pod. Hence the node where we are going to use the pod is not able to get the volume from trident csi.

Pod error: Events: Type Reason Age From Message


Normal Scheduled 55s default-scheduler Successfully assigned icon-jenkins/jenkins-1-jb6jv to Warning FailedAttachVolume 19s (x7 over 54s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-6e5e0db6-4893-444d-9bcb-116b8bbe3fe3" : CSINode does not contain driver csi.trident.netapp.io

Could not update Trident controller with node registration, will retry." error="could not log into the Trident CSI Controller: error communicating with Trident CSI Controller

Environment Provide accurate information about the environment to help us reproduce the issue.

To Reproduce install trident using offline operator method

Expected behavior only one container will up up in oc get pods in trident project.

Additional context Add any other context about the problem here.

eselvam commented 2 years ago

oc get all NAME READY STATUS RESTARTS AGE pod/trident-csi-6748c8c6-ffd55 6/6 Running 0 25h pod/trident-csi-ct922 1/2 Running 0 32m pod/trident-csi-czjbg 1/2 Running 0 25h pod/trident-csi-djfcv 1/2 Running 0 25h pod/trident-csi-gdwdb 1/2 Running 0 24m pod/trident-csi-nl57h 1/2 Running 0 25h pod/trident-csi-pzwwl 2/2 Running 0 25h pod/trident-operator-58bd566749-c5nbb 1/1 Running 0 25h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/trident-csi ClusterIP 172.30.56.179 34571/TCP,9220/TCP 26h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/trident-csi 6 6 1 6 1 kubernetes.io/arch=amd64,kubernetes.io/os=linux 26h

NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/trident-csi 1/1 1 1 26h deployment.apps/trident-operator 1/1 1 1 26h

NAME DESIRED CURRENT READY AGE replicaset.apps/trident-csi-6748c8c6 1 1 1 26h replicaset.apps/trident-operator-58bd566749 1 1 1 26h

eselvam commented 2 years ago

got below error in driver_register container log W0804 13:42:01.784637 1249400 connection.go:173] Still connecting to unix:///plugin/csi.sock W0804 13:42:11.784655 1249400 connection.go:173] Still connecting to unix:///plugin/csi.sock W0804 13:42:21.785179 1249400 connection.go:173] Still connecting to unix:///plugin/csi.sock

Is it related to network policy in openshift Project?

eselvam commented 2 years ago

It seems network policy issue. We have same namespace allow however the pods are running on all the nodes so it is not working. Shall we know how to fix it.

apiVersion: v1 items:

gnarl commented 2 years ago

Hi @eselvam,

The cluster networking for your Kubernetes cluster is something you should investigate. The Trident daemonset Pods that run on the Kubernetes worker nodes need to register with the Trident controller for Trident to work properly. This NetApp KB article describes how to test connectivity.

Being able to help you with your particular cluster configuration isn't something that we're able to do via GitHub issues. This forum isn't well suited to that kind of support. If you need additional help with trouble shooting the reason why the registration isn't working I encourage you to contact NetApp support.

eselvam commented 2 years ago

it is not with cluster networking. It is due to the network policy defined in project i.e namespace level from default template. It allows only same namespace traffic but some how the pods are not reachable to service ip of the trident. Once removed the network policy, it started working. we have a case with RedHat to check how to fix it without removing network policy in trident namespace. Thanks.

gnarl commented 2 years ago

@eselvam, closing this issue since you have a support issue open with RedHat. This issue can be reopened later if necessary.

pmokrz commented 2 years ago

Hi @eselvam, do you have any info from RedHat on this problem, because I have also a problem with me as above on fedora coreos.

eselvam commented 2 years ago

as I fixed the issue by removing networkpolicy, I did not reached redhat.