Open neoaggelos opened 1 year ago
Adding a comment for context around this issue and the problems we observed after digging into it with @marosg42
This looks related with our update of MetalLB from version 0.13.3 (for microk8s 1.27) to 0.13.10 (for microk8s 1.28). We have currently reverted this change #241, therefore the 1.28 version of microk8s should now deploy a "healthy" and well-known metallb configuration.
This manifests as LoadBalancer IP address returning a no route to host
error when trying to access the service.
MetalLB has introduced a few changes in the way that L2 advertisements are handled, by glancing at https://metallb.universe.tf/release-notes/. The most relevant ones look to be:
The default address pool and L2Advertisement that we create looks like this:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-addresspool
namespace: metallb-system
spec:
addresses:
- 10.0.4.10-10.0.4.20
autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default-advertise-all-pools
namespace: metallb-system
When we found issues, we tried to adjust them like so, in order to specify the exact interface metallb should advertise the address:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default-addresspool
namespace: metallb-system
spec:
addresses:
- 10.0.4.10-10.0.4.20
autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default-advertise-all-pools
namespace: metallb-system
spec:
interfaces: [enp1s0f0]
ipAddressPools: [default-addresspool]
This change seems to resolve the MetalLB issues we faced (but would be hard to guess automatically, so it's not a clean solution at the moment).
Finally, we observed repeated logs like this from the hosts:
2023-10-09T07:17:34.137113+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:58:80:ee
2023-10-09T07:17:34.144177+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from ff:ff:ff:ff:ff:ff to 52:54:00:72:38:84
2023-10-09T07:17:34.156987+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:b6:c5:11
2023-10-09T07:17:35.012504+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.036040+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from 52:54:00:72:38:84 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.036706+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.047423+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27
2023-10-09T07:17:35.048360+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27
2023-10-09T07:17:35.069900+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.246 moved from 52:54:00:72:38:84 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.081689+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to ff:ff:ff:ff:ff:ff
2023-10-09T07:17:35.086501+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 on VLAN 2950 moved from ff:ff:ff:ff:ff:ff to 52:54:00:b6:c5:11
2023-10-09T07:17:35.096032+00:00 solqa-lab1-maas maas.neighbour: [info] bond0 (bond) on solqa-lab1-maas: IP address 10.246.167.245 moved from 52:54:00:b6:c5:11 to 52:54:00:bd:39:27
Which we think should be related ot the problem. None of this happens on MetalLB 0.13.3
Summary
In 1.28 we have observed that sometimes services of type LoadBalancer using MetalLB are not accessible from outside the cluster
This is a placeholder PR as this is our current hypothesis about the failures we observe, will update accordingly as we get more context.
What Should Happen Instead?
LoadBalancer services should be accessible outside of the cluster, L2 advertisements should work without issues.
Reproduction Steps
juju bootstrap microk8s --config controller-service-type=loadbalancer
Can you suggest a fix?
WIP
Are you interested in contributing with a fix?
cc @marosg42