canonical / microk8s-core-addons

Core MicroK8s addons
Apache License 2.0
40 stars 33 forks source link

(1.28) MetalLB issues #231

Open neoaggelos opened 1 year ago

neoaggelos commented 1 year ago

Summary

In 1.28 we have observed that sometimes services of type LoadBalancer using MetalLB are not accessible from outside the cluster

This is a placeholder PR as this is our current hypothesis about the failures we observe, will update accordingly as we get more context.

What Should Happen Instead?

LoadBalancer services should be accessible outside of the cluster, L2 advertisements should work without issues.

Reproduction Steps

  1. Install MicroK8s 1.28
  2. Enable metallb
  3. juju bootstrap microk8s --config controller-service-type=loadbalancer

Can you suggest a fix?

WIP

Are you interested in contributing with a fix?

cc @marosg42

neoaggelos commented 11 months ago

Adding a comment for context around this issue and the problems we observed after digging into it with @marosg42

This looks related with our update of MetalLB from version 0.13.3 (for microk8s 1.27) to 0.13.10 (for microk8s 1.28). We have currently reverted this change #241, therefore the 1.28 version of microk8s should now deploy a "healthy" and well-known metallb configuration.

Things we observed:

This manifests as LoadBalancer IP address returning a no route to host error when trying to access the service.

Things we found

Things we tried

The default address pool and L2Advertisement that we create looks like this:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-addresspool
  namespace: metallb-system
spec:
  addresses:
  - 10.0.4.10-10.0.4.20
  autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default-advertise-all-pools
  namespace: metallb-system

When we found issues, we tried to adjust them like so, in order to specify the exact interface metallb should advertise the address:


apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default-addresspool
  namespace: metallb-system
spec:
  addresses:
  - 10.0.4.10-10.0.4.20
  autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default-advertise-all-pools
  namespace: metallb-system
spec:
  interfaces: [enp1s0f0]
  ipAddressPools: [default-addresspool]

This change seems to resolve the MetalLB issues we faced (but would be hard to guess automatically, so it's not a clean solution at the moment).

Finally, we observed repeated logs like this from the hosts:

2023-10-09T07:17:34.137113+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:58:80:ee
2023-10-09T07:17:34.144177+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from ff:ff:ff:ff:ff:ff to 52:54:00:72:38:84
2023-10-09T07:17:34.156987+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:b6:c5:11
2023-10-09T07:17:35.012504+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.036040+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from 52:54:00:72:38:84 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.036706+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.047423+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27
2023-10-09T07:17:35.048360+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27
2023-10-09T07:17:35.069900+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from 52:54:00:72:38:84 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.081689+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.086501+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:b6:c5:11
2023-10-09T07:17:35.096032+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27

Which we think should be related ot the problem. None of this happens on MetalLB 0.13.3