kubelet port 10250 not available for pods

Is there an existing issue for this?

[X] I have searched the existing issues

What happened?

Preface: I have migrated a cluster from Calico to Cilium. Switched completely to eBPF mode. Removed all calico and kube-proxy components. Policy enforcement is set as 'default', meaning by default communications are allowed. Also enabled host firewall, but no cluster wide / host policies are present so far. Routing mode is left default - encapsulation. Full final values file will be attached in the end.

Issue: After the move, the communication to kubelet port 10250 to scrape metrics got broken. Neither metrics-server, nor prometheus pods can connect to this endpoint to collect data.

The error that metrics-server prints during the failure:

E0731 09:18:55.370578       1 scraper.go:140] "Failed to scrape node" err="Get \"https://10.30.7.121:10250/metrics/resource\": context deadline exceeded" node="wallet7"

Things tried: 1) Removed all netpol and cnp resources in kube-system namespace to make sure metrics server is unrestricted. 2) Stopped UFW on the nodes to make sure it doesn't affect it. In the setup, UFW doesn't restrict backnet communications at all, all k8s nodes communicate with no blocking policies on OS level. 3) hubble observe to port 10250 show traffic is forwarded and not dropped. 4) Tried adding allowing CNP that didn't change anything:

spec:
  egress:
  - toEntities:
    - host
    - remote-node
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: metrics-server

5) Tried to tcpdump traffic on the node, it shows that the packet reaches cilium_net interface (for the node where both metrics-server and target kubelet reside). But confirmed with strace that kubelet pid does not receive this connection. So it gets lost somewhere between cilium_net and host routing. 6) It's important to note that communication to kube-apiserver (node ip and port 6443) works fine, but it's a pod running in hostNetwork mode on the same node.

Can I ask for any tips/hints what could cause such behavior? Or any troubleshooting items to try?

Cilium Version

cilium-cli: v0.15.3 compiled with go1.20.4 on darwin/arm64
cilium image (default): v1.13.4
cilium image (stable): v1.14.0
cilium image (running): 1.13.4

Kernel Version

5.15.0-78-generic

Host OS is Ubuntu 22.04

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.6", GitCommit:"ff2c119726cc1f8926fb0585c74b25921e866a28", GitTreeState:"clean", BuildDate:"2023-01-18T19:15:26Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

cilium-sysdump-20230731-161450.zip

Relevant log output

tcpdump on cilium_net interface

root@wallet7:~# tcpdump -n -i cilium_net 'port 10250'
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cilium_net, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:36:04.186886 IP 10.244.1.64.46150 > 10.30.7.121.10250: Flags [S], seq 4292016426, win 64860, options [mss 1410,sackOK,TS val 3365305791 ecr 0,nop,wscale 7], length 0
09:36:05.293010 IP 10.244.1.145.40826 > 10.30.7.121.10250: Flags [S], seq 1686743227, win 64860, options [mss 1410,sackOK,TS val 285850807 ecr 0,nop,wscale 7], length 0
09:36:06.294836 IP 10.244.1.145.40826 > 10.30.7.121.10250: Flags [S], seq 1686743227, win 64860, options [mss 1410,sackOK,TS val 285851809 ecr 0,nop,wscale 7], length 0

hubble observe:

root@wallet7:/home/cilium# hubble observe --to-port 10250 --to-ip 10.30.7.121 | tail -10
Jul 31 09:37:45.562: observability/prometheus-kube-prometheus-stack-prometheus-0:50584 (ID:53295) -> 10.30.7.121:10250 (host) to-stack FORWARDED (TCP Flags: SYN)
Jul 31 09:37:49.146: kube-system/metrics-server-6bd8d699c5-xjcc6:39112 (ID:15550) -> 10.30.7.121:10250 (host) to-host FORWARDED (TCP Flags: SYN)
Jul 31 09:37:49.146: kube-system/metrics-server-6bd8d699c5-xjcc6:39112 (ID:15550) -> 10.30.7.121:10250 (host) to-stack FORWARDED (TCP Flags: SYN)
Jul 31 09:37:49.503: observability/prometheus-kube-prometheus-stack-prometheus-0:57470 (ID:53295) -> 10.30.7.121:10250 (host) policy-verdict:L3-Only EGRESS ALLOWED (TCP Flags: SYN)
Jul 31 09:37:49.503: observability/prometheus-kube-prometheus-stack-prometheus-0:57470 (ID:53295) -> 10.30.7.121:10250 (host) to-host FORWARDED (TCP Flags: SYN)
Jul 31 09:37:49.503: observability/prometheus-kube-prometheus-stack-prometheus-0:57470 (ID:53295) -> 10.30.7.121:10250 (host) to-stack FORWARDED (TCP Flags: SYN)
Jul 31 09:37:56.570: observability/prometheus-kube-prometheus-stack-prometheus-0:57470 (ID:53295) -> 10.30.7.121:10250 (host) to-host FORWARDED (TCP Flags: SYN)
Jul 31 09:37:56.570: observability/prometheus-kube-prometheus-stack-prometheus-0:57470 (ID:53295) -> 10.30.7.121:10250 (host) to-stack FORWARDED (TCP Flags: SYN)
Jul 31 09:37:56.876: kube-system/metrics-server-6bd8d699c5-xjcc6:33320 (ID:15550) -> 10.30.7.121:10250 (host) to-host FORWARDED (TCP Flags: SYN)
Jul 31 09:37:56.876: kube-system/metrics-server-6bd8d699c5-xjcc6:33320 (ID:15550) -> 10.30.7.121:10250 (host) to-stack FORWARDED (TCP Flags: SYN)

Anything else?

cilium installation values:

bpf:
  hostLegacyRouting: false
  masquerade: true
cluster:
  name: cluster.local
cni:
  customConf: false
  uninstall: false
hostFirewall:
  enabled: true
hostPort:
  enabled: true
hubble:
  relay:
    enabled: true
    tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
  ui:
    enabled: true
    frontend:
      server:
        ipv6:
          enabled: false
    tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4PodCIDRList:
    - 10.244.0.0/16
kubeProxyReplacement: strict
k8sServiceHost: 10.30.7.121
k8sServicePort: 6443
operator:
  replicas: 1
  unmanagedPodWatcher:
    restart: true
policyEnforcementMode: default
serviceAccounts:
  cilium:
    name: cilium
  operator:
    name: cilium-operator
tunnel: vxlan
tunnelPort: 8473

Code of Conduct

[X] I agree to follow this project's Code of Conduct

cilium / cilium