Error Enabling Addon "metallb"

zenhighzer commented 2 years ago

Summary

3 x Raspberry Pi Ubuntu 22.10 / MicroK8s via. snap

Enabling Addon metallb throws error:

deployment.apps/controller condition met
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

Services with Type "LoadBalancer" are in state "pending": k get svc: default test LoadBalancer 10.152.183.52 <pending> 80:31110/TCP 44m

What Should Happen Instead?

No errors while activating metallb-Addon and Services with type LoadBalancer should get an IP

Reproduction Steps

Following the Guide: https://ubuntu.com/tutorials/how-to-kubernetes-cluster-on-raspberry-pi#1-overview 1) Install Ubuntu on PIs 2) Edited cmdline-file with cgroup_enable=memory cgroup_memory=1 -> whole line looks like: cgroup_enable=memory cgroup_memory=1 console=serial0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc quiet splash

3)reboot 4) install microk8s via snap 5) Build Cluster via microk8s 6) Enabling Microk8s Addons:

dns
rbac
metallb (range 192.168.80.20-192.168.80.30)

Pods seem to run fine:

NAMESPACE        NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
kube-system      calico-kube-controllers-5d7dbf4c7d-vtgzs   1/1     Running   0          65m   10.1.166.193    k8s1   <none>           <none>
kube-system      coredns-d489fb88-8wtb9                     1/1     Running   0          53m   10.1.219.1      k8s3   <none>           <none>
ingress          nginx-ingress-microk8s-controller-bf7r7    1/1     Running   0          48m   10.1.219.2      k8s3   <none>           <none>
ingress          nginx-ingress-microk8s-controller-cc8c5    1/1     Running   0          48m   10.1.166.194    k8s1   <none>           <none>
ingress          nginx-ingress-microk8s-controller-9sv6t    1/1     Running   0          48m   10.1.109.66     k8s2   <none>           <none>
default          test-75d6d47c7f-rrrgx                      1/1     Running   0          43m   10.1.109.67     k8s2   <none>           <none>
default          test-75d6d47c7f-8r9s2                      1/1     Running   0          43m   10.1.219.3      k8s3   <none>           <none>
default          test-75d6d47c7f-75ttz                      1/1     Running   0          43m   10.1.166.195    k8s1   <none>           <none>
metallb-system   controller-56c4696b5-gsxpc                 1/1     Running   0          16m   10.1.109.69     k8s2   <none>           <none>
metallb-system   speaker-bx9c8                              1/1     Running   0          16m   192.168.80.12   k8s2   <none>           <none>
metallb-system   speaker-h5h4x                              1/1     Running   0          16m   192.168.80.11   k8s1   <none>           <none>
metallb-system   speaker-xstvh                              1/1     Running   0          16m   192.168.80.13   k8s3   <none>           <none>
kube-system      calico-node-pm6cw                          1/1     Running   0          57m   192.168.80.11   k8s1   <none>           <none>
kube-system      calico-node-krhc2                          1/1     Running   0          56m   192.168.80.12   k8s2   <none>           <none>
kube-system      calico-node-bszf9                          1/1     Running   0          56m   192.168.80.13   k8s3   <none>           <none>

Introspection Report

After inspect there is an error: The memory cgroup is not enabled, but it should be -> (please look at Reproduction Steps)

Can you suggest a fix?

I tried the same setup, but with Ubuntu 20.04.5: no Errors, Services with Type LoadBalancer are receiving an IP. So the the error must have something to do with Ubuntu 22.10

Are you interested in contributing with a fix?

I would like to help, but dont know how

inspection-report-20221027_133229.tar.gz

zacbayhan commented 2 years ago

I was recently working on a similar issue, and I determined the proxy wasn't set correctly in the /etc/environment or the /etc/profile.d/proxy.sh, do you get anything running https://webhook-service.metallb-system.svc -vvv or dig https://webhook-service.metallb-system.svc

panlinux commented 1 year ago

I'm seeing the same issue on two 22.10 ubuntu systems, I opened a thread in discourse: https://discuss.kubernetes.io/t/error-enabling-metallb-internal-error-context-deadline-exceeded/22092/

panlinux commented 1 year ago

I repeated my same steps on an Ubuntu 22.04 LTS install, and this time it all worked.

More specifically, I retried a simpler case in VMs first, without involving metallb, and found out that the connection to a service ip was flaky, and only worked quickly when the endpoint it was hitting happened to be on the same node. I retested that scenario with ubuntu 22.10 and 22.04, and it consistently failed when the OS was ubuntu 22.10.

gcraenen commented 1 year ago

I'm having the same issues with 3 miniforum nucs and Ubuntu 22.10.

zacbayhan commented 1 year ago

looking at the discussion panlinux posted it looks like you were having webhook error

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": context deadline exceeded
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "l2advertisementvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s": context deadline exceeded

try running kubectl -n metallb get validatingwebhookconfiguration -o yaml

and see if failurePolicy is set to Fail, I believe you can set it to ignore.

It looks like it might be getting hung on proxy? so that might be another option to look into

panlinux commented 1 year ago

and see if failurePolicy is set to Fail, I believe you can set it to ignore.

It's Fail indeed, but before ignoring an error it's important to understand why it's happening, and only in ubuntu kinetic. It works in jammy.

It looks like it might be getting hung on proxy? so that might be another option to look into

No proxy. This can be easily replicated in a kinetic vm. I just did it now, with microk8s 1.25.4 and two kinetic vms.

neoaggelos commented 1 year ago

Hi @panlinux, this could be related to a vxlan bug that breaks checksum calculation.

Could you try to see whether:

microk8s kubectl patch felixconfigurations default --patch '{"spec":{"featureDetectOverride":"ChecksumOffloadBroken=true"}}' --type=merge

helps with your issue?

kathoef commented 1 year ago

It seems to help to put e.g. metallb-system.svc into the set ofno_proxy variables

$ cat /etc/environment
...
NO_PROXY=127.0.0.1,::1,localhost,10.152.183.0/24,10.1.0.0/16,metallb-system.svc
no_proxy=127.0.0.1,::1,localhost,10.152.183.0/24,10.1.0.0/16,metallb-system.svc
...

as suggested somewhere above and in this metallb repo issue.(I had activated the DNS addon before activating the metallb addon, i.e. microk8s enable dns if that is important.)

natarajmb commented 1 year ago

@neoaggelos thanks for the workaround. I just tried your fix and it works. My setup

4 x RPI 4B running on 22.10 server with DNS addon.

I patched felixconfigurations and enabled metallb and not seeing any errors. I had existing BGP configurations and all are working as expected. Thank you 👍

risha700 commented 10 months ago

spec: KUBE_VER="v1.29" METALLB_VER="v0.13.12" CALICO_VER="v3.27.0" Ubuntu 22.04.3 LTS The same unreachable error Cause: Networking misconfiguration Check your firewall and connectivity before proceeding with any installations. in my case it was ICMP unreachable with direct ip and isnt routable from master to slaves. that answers the why @panlinux

canonical / microk8s