gardener / gardener-extension-networking-cilium

Gardener extension controller for the Cilium CNI network plugin.
https://gardener.cloud
Apache License 2.0
13 stars 39 forks source link

direct routing and BPF datapath of kube-proxy replacement #386

Open hown3d opened 2 months ago

hown3d commented 2 months ago

How to categorize this issue?

/area networking /kind bug

What happened: When running Cilium as a kube-proxy replacement and the eBPF datapath is chosen (will be introduced with https://github.com/gardener/gardener-extension-networking-cilium/pull/350) the lo device will be ignored to search for host addresses https://github.com/cilium/cilium/blob/9d631b91ad4d2c146d3decbfcfc39968764eb539/pkg/datapath/linux/devices.go#L32-L38 Running without a network overlay let's request inside containers against https://kubernetes time-out.

This currently isn not reproducible when running without overlay because bpf-masquerade get's disabled in that case: https://github.com/gardener/gardener-extension-networking-cilium/blob/e6d1fcc9e77f3eb52683955d2144a064e3741b88/charts/internal/cilium/charts/config/templates/configmap.yaml#L335-L337

Cilium will fallback to the legacy implementation of hostrouting instead of using the eBPF datapath:

$ kubectl -n kube-system logs ds/cilium
time="2024-08-07T14:01:05Z" level=info msg="BPF host routing requires enable-bpf-masquerade. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon

What you expected to happen: Pods are able to access the kube-apiserver via service discovery

How to reproduce it (as minimally and precisely as possible): Create a shoot without overlay and enable the kube-proxy replacement. Either:

  1. Add enable-bpf-masquerade: true to the cilium-config configmap in kube-system

or

  1. Install cilium extension using branch of PR #350

Example shoot spec to reproduce:

spec:
  kubernetes:
    kubeProxy:
      enabled: false
  networking:
    type: cilium
    providerConfig:
      apiVersion: cilium.networking.extensions.gardener.cloud/v1alpha1
      kind: NetworkConfig
      hubble:
        enabled: true
      tunnel: disabled
      ipv4NativeRoutingCIDREnabled: true
      overlay:
        enabled: false
        createPodRoutes: true

Anything else we need to know?:

Environment:

hown3d commented 1 month ago

Related issue and commit in the cilium repository. Cilium has a hidden flag called --local-max-addr-scope which is by default to scope link (253) - 1 after v1.13.

IP addresses on a devices with scope higher than link (e.g. scope host like the apiserver-proxy creates) will be skipped.