flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.6k stars 2.87k forks source link

no route to host from all nodes to all services except to kubernetes service #1971

Open UriZafrir opened 1 month ago

UriZafrir commented 1 month ago

Hi everyone I'm running an RKE cluster. I have a problem in which I get "no route to host" when trying to query services from a node.

k get svc -A
argocd          argo-cd-argocd-applicationset-controller   ClusterIP      10.43.71.196    <none>                          7000/TCP                     9d
argocd          argo-cd-argocd-dex-server                  ClusterIP      10.43.60.116    <none>                          5556/TCP,5557/TCP            9d
argocd          argo-cd-argocd-redis                       ClusterIP      10.43.37.182    <none>                          6379/TCP                     9d
argocd          argo-cd-argocd-repo-server                 ClusterIP      10.43.200.3     <none>                          8081/TCP                     9d
argocd          argo-cd-argocd-server                      ClusterIP      10.43.229.66    <none>                          80/TCP,443/TCP               9d
default         kubernetes                                 ClusterIP      10.43.0.1       <none>                          443/TCP                      9d
ingress-nginx   ingress-nginx-controller                   LoadBalancer   10.43.70.189    172.20.121.173,172.20.121.174   80:30996/TCP,443:32439/TCP   9d
ingress-nginx   ingress-nginx-controller-admission         ClusterIP      10.43.137.222   <none>                          443/TCP                      9d
kube-system     kube-dns                                   ClusterIP      10.43.0.10      <none>                          53/UDP,53/TCP,9153/TCP       9d
kube-system     metrics-server                             ClusterIP      10.43.183.119   <none>                          443/TCP                      7d12h
kubeshark       kubeshark-front                            ClusterIP      10.43.200.80    <none>                          80/TCP                       7d17h
kubeshark       kubeshark-hub                              ClusterIP      10.43.162.11    <none>                          80/TCP                       7d17h
kubeshark       kubeshark-worker-metrics                   ClusterIP      10.43.64.10     <none>                          49100/TCP                    7d17h
telnet  10.43.0.10 53
Trying 10.43.0.10...
telnet: connect to address 10.43.0.10: No route to host
telnet 10.43.229.66 443
Trying 10.43.229.66...
telnet: connect to address 10.43.229.66: No route to host
telnet 10.43.70.189 80
Trying 10.43.70.189...
telnet: connect to address 10.43.70.189: No route to host
telnet 10.43.137.222 443
Trying 10.43.137.222...
telnet: connect to address 10.43.137.222: No route to host

This is the flow of debugging i did: I got this line when using k get pods:

E0519 05:23:36.925419 1110186 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

checking the apiservices i got faileddiscovery check for metrics server:

kubectl get apiservices
v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   7d12h

when describing the apiservice i got:

Message: failing or missing response from https://10.43.183.119:443/apis/metrics.k8s.io/v1beta1: Get "https://10.43.183.119:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

kubectl describe apiservice v1beta1.metrics.k8s.io
E0519 05:30:38.505746 1113885 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 05:30:38.535446 1113885 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 05:30:38.538759 1113885 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 05:30:38.542372 1113885 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Name:         v1beta1.metrics.k8s.io
Namespace:
Labels:       k8s-app=metrics-server
Annotations:  <none>
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2024-05-11T13:38:43Z
  Resource Version:    1332438
  UID:                 ae69ae9d-f893-400b-b993-7be2e8af833b
Spec:
  Group:                     metrics.k8s.io
  Group Priority Minimum:    100
  Insecure Skip TLS Verify:  true
  Service:
    Name:            metrics-server
    Namespace:       kube-system
    Port:            443
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2024-05-11T13:38:43Z
    Message:               failing or missing response from https://10.43.183.119:443/apis/metrics.k8s.io/v1beta1: Get "https://10.43.183.119:443/apis/metrics.k8s.io/v1beta1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

after which i tried to telnet the services and discovered the problem is not only to the metrics-server service.

Would appreciate some assistnce.

Your Environment

rbrtbnfgl commented 1 month ago

Are you using Canal? You are mentioning Calico and Flannel version and the flannel version is very old. The service IP translation are not done by the CNI. You could check your iptables rules if the IP translation is there. You need to have a default route on your system too.

UriZafrir commented 1 month ago

Will check my iptable rules. Yes I'm using Canal. on RKE. 1.24