admiraltyio / admiralty

A system of Kubernetes controllers that intelligently schedules workloads across clusters.
https://admiralty.io
Apache License 2.0
683 stars 86 forks source link

Broken federation target prevents others from working #106

Closed dimm0 closed 3 years ago

dimm0 commented 3 years ago

I have 2 targets set up in different namespaces. One target got broken (the token was recreated in remote cluster, and admiralty couldn't authenticate), which broke the remote pod in another namespace federation - it was stuck terminating and new ones were not starting. Once I fixed the config, the other namespace started working fine.

It would be nice to handle the different federation links in independent goroutines, so that a single user can't break the whole cluster.

dimm0 commented 3 years ago

@adrienjt will you fix this, or I'm on my own? :)

adrienjt commented 3 years ago

I was able to repro this issue. This only happens if the Admiralty controller manager is restarted while the token has expired.

  1. Admiralty OK
  2. Break target by deleting token in remote cluster
  3. Admiralty still OK for other targets
  4. Restart Admiralty controller manager
  5. Admiralty waits indefinitely to sync the broken target's informers, so controllers that depend on any of them can't start (they're stuck waiting on them to sync), e.g., the feedback controller depends on pod chaperon informers from all targets.

We shouldn't wait indefinitely for informers to sync, and make sure we handle stale caches.

Or, like you said, for each type of controller, we could run one controller per target instead of a single fan-out controller.