cloudfoundry / cf-k8s-networking

building a cloud foundry without gorouter....
Apache License 2.0
32 stars 17 forks source link

[BUG] LeaderElection is misconfigured in routecontroller #75

Closed tcdowney closed 3 years ago

tcdowney commented 3 years ago

Summary

As part of upgrading Golang and other dependencies in routecontroller we uncovered a bug in how we were configuring LeaderElection. It was enabled on the routecontroller in #175193243, but had several issues.

The first issue was due to https://github.com/kubernetes-sigs/controller-runtime/issues/445 requiring that LeaderElectionID and LeaderElectionNamespace be explicitly set.

We fixed this first issue in https://github.com/cloudfoundry/cf-k8s-networking/commit/a8d9323599bc3d43f3f7a2791d74aeb4f12d1cbb, but leader election is still failing since we don't have the necessary RBAC since it now is trying to use the leases.coordination.k8s.io resource.

E0319 16:14:23.174545       1 leaderelection.go:325] error retrieving resource lock cf-system/cf-k8s-networking-routecontroller: leases.coordination.k8s.io "cf-k8s-networking-routecontroller" is forbidden: User "system:serviceaccount:cf-system:routecontroller" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "cf-system"
tcdowney commented 3 years ago

We manually validated this by...

  1. Scaled the routecontroller to 2 replicas and observed the logs for each Pod.
  2. We observed that one was doing work and reconciling Route resources. The other was not doing work because it did not have the lease. It logged something like: I0322 17:25:50.999788 1 leaderelection.go:243] attempting to acquire leader lease cf-system/cf-k8s-networking-routecontroller...
  3. We confirmed that integration tests were no passing for routecontroller.