flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.81k stars 2.87k forks source link

Connectivity Issue Between Pods on Different Nodes in Kubernetes #2049

Open Medamine216 opened 2 months ago

Medamine216 commented 2 months ago

I am facing an issue with connectivity between pods that are running on different nodes in my Kubernetes cluster. Pods on the same node can communicate without any problems, but communication fails when the pods are on different nodes.

Here are some details about my environment:

I'm running a Kubernetes cluster with multiple nodes (node1, node2). Each node has a CNI plugin configured, I'm using Flannel Pods on the same node can ping each other successfully. When I try to ping a pod on a different node, I get 100% packet loss. The CNI plugin seems to be running without errors in the logs.

Here are the commands and results I used to test the connectivity:

vagrant@master:~$ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS       AGE   IP           NODE    NOMINATED NODE   READINESS GATES
debug-pod                           1/1     Running   2 (166m ago)   13h   10.244.1.4   node1   <none>           <none>
nginx-deployment-7c79c4bf97-6c77m   1/1     Running   0              13h   10.244.2.3   node2   <none>           <none>
nginx-deployment-7c79c4bf97-ct2hc   1/1     Running   0              13h   10.244.1.3   node1   <none>           <none>

vagrant@master:~$ kubectl exec -it debug-pod -- ping 10.244.2.3
PING 10.244.2.3 (10.244.2.3): 56 data bytes

Route Master:

vagrant@master:~$ ip route | grep flannel.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 

Route Node 1:

vagrant@node1:~$ ip route | grep flannel.1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 

Route Node 2:

vagrant@node2:~$ ip route | grep flannel.1
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink

I'm running out of ideas on what could be causing this issue. Is there anything else I should check to resolve the connectivity loss between nodes?

Thank you in advance for your `help!```

rbrtbnfgl commented 1 month ago

Could you try to disable the offload? $ ethtool --offload eth0 rx off tx off $ ethtool -K eth0 gso off On both nodes