canonical / k8s-snap

Canonical Kubernetes is an opinionated and CNCF conformant Kubernetes operated by Snaps and Charms, which come together to bring simplified operations and an enhanced security posture on any infrastructure.
GNU General Public License v3.0
43 stars 13 forks source link

Can't access l2 loadbalancers with cilium CNI from a VM #702

Closed gboutry closed 4 weeks ago

gboutry commented 1 month ago

Summary

I have a simple setup where I'm running LXD and K8S on the same laptop. K8s is bootstrapped on the lxdbr0 first ip address (host ip on this network), and is configured to advertised a given range on to provide loadbalancer feature.

When deploying from the k8s snap cilium based, I can't reach the LoadBalanced services exposed on k8s:

ubuntu@juju-5dd47c-0:~$ curl http://10.206.54.230
curl: (7) Failed to connect to 10.206.54.230 port 80 after 3073 ms: No route to host

When deploying with Microk8s or the moonray track, this works correctly:

ubuntu@juju-5dd47c-0:~$ curl http://10.206.54.230
404 page not found

Graph of setup:

architecture-beta
    group laptop(server)[Laptop]

    service nic(internet)[LXD BR] in laptop
    service vm(server)[LXD VM] in laptop
    service k8s(server)[K8S Snap] in laptop

    vm:L -- R:nic
    k8s:B -- T:nic

What Should Happen Instead?

I should be able to reach l2 advertised ip addresses on my laptop from my LXD VMs

Reproduction Steps

  1. snap install k8s --channel latest/edge --classic
  2. sudo k8s bootstrap --address 10.206.54.1 --file k8s.yaml
  3. sudo k8s config | juju add-k8s ck8s --controller lxd
  4. juju add-model ingressed ck8s
  5. juju deploy traefik-k8s --channel 1.0/beta --trust
  6. juju add-model compute lxd
  7. juju add-machine --constraints virt-type=virtual-machine
  8. juju ssh 0 curl http://10.206.54.230 (this is the chosen ip address in the loadbalancer svc)

k8s.yaml:

cluster-config:
  network:
    enabled: true
  dns:
    enabled: true
    upstream-nameservers:
      - 10.206.54.1
  load-balancer:
    enabled: true
    cidrs:
    - 10.206.54.230-10.206.54.239
    l2-mode: true
  local-storage:
    enabled: true
    default: true
  ingress:
    enabled: false
  gateway:
    enabled: false
  metrics-server:
    enabled: true

System information

PRETTY_NAME="Ubuntu 24.04.1 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.1 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu

Can you suggest a fix?

No response

Are you interested in contributing with a fix?

No response

gboutry commented 1 month ago

Github fails to attach my inspection reports, is there another place where I can upload them?

berkayoz commented 1 month ago

Hey @gboutry, thanks for posting this issue.

I was able to reproduce the issue and this seems to be related to cilium failing to auto-detect/choose the correct interface to do routing with. I'll bring up the issue for a discussion in the team and we'll decide on a solution that might include efforts on our side to choose the correct interface.

More technically a non-default route seems to be de-prioritized in auto detection so we might need to set the devices option of cilium manually based on the node ip supplied on bootstrap. Cilium docs for reference