Panfactum / stack

The Panfactum Stack
https://panfactum.com
Other
16 stars 5 forks source link

[Bug]: kube-fledged image cache sync interferes with Karpenter scale-down #127

Closed fullykubed closed 1 month ago

fullykubed commented 2 months ago

Prior Search

What happened?

Kube-fledged periodically runs pods on every node that attempt to pull images to ensure that node's image cache is up to date. This runs every 3 minutes in the current stack configuration.

However, while these pods are running, Karpenter cannot disrupt the nodes because the kube-fledged pods are bound to their nodes and cannot be rescheduled on different nodes (a requirement of karpeneter scale-down). Since kube-fledged runs so often, these often leaves Karpenter perpetually unable to disrupt nodes.

The challenge is that the kube-fledged sync does not run automatically on new node creation so unless the sync runs often its possible a node might not have images in its image cache when needed.

Personally, it seems like we might need to fork kube-fledged to add this capability since the project seems relatively unmaintained.

Steps to Reproduce

Default behavior of the stack. Simply observe.

Relevant log output

not all pods would schedule, linkerd/linkerd-proxy-czh6d-7h5r4 => incompatible wit │
│ h nodepool "spot-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kubernetes.io/hostname │
│ , kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5098]; incompati │
│ ble with nodepool "spot", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kubernetes.io/hostn │
│ ame, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5099]; incomp │
│ atible with nodepool "burstable-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key kuberne │
│ tes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-placeholder-5 │
│ 00[]; incompatible with nodepool "burstable", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requirements, key │
│  kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [hostname-place │
│ older-5101[]; incompatible with nodepool "on-demand-arm", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatible requi │
│ rements, key kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostname In [ho │
│ tname-placeholder-5102[]; incompatible with nodepool "on-demand", daemonset overhead={"cpu":"304m","memory":"815053350","pods":"8"}, incompatib │
│ le requirements, key kubernetes.io/hostname, kubernetes.io/hostname In [ip-10-0-150-93.us-east-2.compute.internal] not in kubernetes.io/hostnam │
│ e In [hostname-placeholder-5103]
wesbragagt commented 2 months ago

@fullykubed would this increase cost for users running kube_fledged?

fullykubed commented 2 months ago

@wesbragagt I am still collecting cost data to determine the impact, but I suspect it has an impact.

For this and a few other reasons, we are likely to going to fork the kube-fledged project and manage a custom version ourself that plays nicer with modern cluster components (kube-fledged is unmaintained it seems). Our goal is to have that integrated by the next stable release.

fullykubed commented 1 month ago

This is resolved with introduction of kyverno.