Define a curated list of priority classes for Giant Swarm workload

QuentinBisson commented 1 month ago

We came to the conclusion in @giantswarm/sig-architecture that we need to define a list of curated priority classes for giant swarm workload so that we do not have too many (let's not have 1 per app like we have right now with crossplane and flux), but enough to make sure that highly critical (kyverno, prometheus-agents), critical (promtail) and a bit less critical components (fluent-bit) are scheduled with more priority than other workloads.

We currently have the following classes:

Workload clusters

NAME VALUE GLOBAL-DEFAULT

giantswarm-critical 1000000000 false
system-cluster-critical 2000000000 false
system-node-critical 2000001000 false

Management clusters

NAME VALUE GLOBAL-DEFAULT crossplane-critical 600000000 false
flux-giantswarm-flux-giantswarm 1000000000 false
giantswarm-critical 1000000000 false
prometheus 500000000 false
system-cluster-critical 2000000000 false
system-node-critical 2000001000 false

Goals of this issue is to:

[ ] Establish a curated set of priority classes for all apps
[ ] Ensure we have the same priority classes on management and workload clusters and that they are deployed the same way (giantswarm-critical being deployed by the chart-operator)

@giantswarm/sig-architecture Do you have the a list of components that we run in CAPI clusters so I can create a table with their priority classes?

piontec commented 1 month ago

OK, so it seems flux-giantswarm should just use giantswarm-critical. I think we need something like ginatswarm-high, lower prio than critical, but still for, well, important stuff, like crossplane. Maybe to get started we keep the 2 system*, as they don't really apply to "normal" apps, I believe, but really critical system components. Then, for important apps, we could use something like:

giantswarm-critical
giantswarm-very-high
giantswarm-high

WDYT?

QuentinBisson commented 1 month ago

I would be more enclined to go with:

giantswarm-critical
giantswarm-high
giantswarm-medium

That way we can add giantswarm-low if we need to. I'm not sure I would add things inbetween those.

We could instead use the following (yes the migration effort will take time so we could have giantswarm-critical = giantswarm-high):

giantswarm-high
giantswarm-medium
giantswarm-low

Now I'm not sure what component would be in each though. Are we fine with prometheus being in the high priority ?

giantswarm / roadmap

Define a curated list of priority classes for Giant Swarm workload #3483

Workload clusters

Management clusters