[Action] Separate nodes for Falco and our internal stack

cncf-tags / green-reviews-tooling

Project Repository for the WG Green Reviews which is part of the CNCF TAG Environmental Sustainability

https://github.com/cncf/tag-env-sustainability/tree/main/working-groups/green-reviews

Apache License 2.0

25 stars 14 forks source link

[Action] Separate nodes for Falco and our internal stack #30

Closed rossf7 closed 8 months ago

rossf7 commented 9 months ago

This issue is to create separate kubernetes nodes for our internal stack (Flux / Prometheus) and Falco (first project we're measuring).

Node isolation is important to ensure we can measure the footprint of projects accurately.

See https://github.com/falcosecurity/cncf-green-review-testing/issues/2 for Falco node requirements

Node requirements

The nodes will be managed by opentofu and have these names and node labels.

green-reviews-worker-internal1

cncf-project: wg-green-reviews
cncf-project-sub: internal

green-reviews-worker-falco1

cncf-project: falco
cncf-project-sub: falco-driver-modern-ebpf

We will start with using node labels and selectors for placing pods on nodes ~~but we may also need to introduce node taints and tolerations~~.

dipankardas011 commented 9 months ago

I think I can help

rossf7 commented 9 months ago

@dipankardas011 Thank you! My initial thought on this was to replace the list var of worker nodes here with a map that includes the labels. WDYT?

We would need to use node selectors for Prometheus, Flux and any other components so they run on the internal node.

Another approach is to add a taint to the falco node and ask the falco team to add a toleration in https://github.com/falcosecurity/cncf-green-review-testing

cc @nikimanoledaki @AntonioDiTuri

dipankardas011 commented 9 months ago

yes we can do using node label selection or taints and tolerations

Actually I was trying out a specific problem related to this wherein I wanted t oschedule the pod to only controlplane nodes given traints and tolerations

https://github.com/kubesimplify/ksctl/blob/ade22eebe56a3d79dee4892b2b0b331ff71b47ef/internal/k8sdistros/universal/ksctl.go#L28-L68

https://github.com/kubesimplify/ksctl/blob/ade22eebe56a3d79dee4892b2b0b331ff71b47ef/internal/k8sdistros/universal/ksctl.go#L88-L95

may be you can tell if it helps in this issue

rossf7 commented 9 months ago

@dipankardas011 That's a nice approach and in future we may want to run some workloads on our control plane node if we start to max out the "system" node where we run Prometheus, Flux etc.

A downside I see with a taint per project is we need to run Kepler on all nodes. So we'd need to add tolerations for all the taints.

How about we start with adding node selectors to the kube-prometheus-stack helm release and the flux bootstrap?

https://fluxcd.io/flux/installation/configuration/boostrap-customization/

If adding a node selector for each component becomes too hard to manage we can look at alternatives later.

dipankardas011 commented 9 months ago

I have seen usage of scheduling profile (to reduce the usage of node selector and ... by just modifying the scheduling pofile) but it has a major downside (no support for daemonset pods) https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity-per-scheduling-profile

check for the NOTEs section

dipankardas011 commented 9 months ago

Query

I need to modify the fluxcd manifest for installing prometheus, kepler, ...
even if we do that prometheus exporter will be a daemonset thus present in every node, not sure with kepler!

cc @rossf7 @nikimanoledaki @AntonioDiTuri

rossf7 commented 9 months ago

I need to modify the fluxcd manifest for installing prometheus, kepler, ... even if we do that prometheus exporter will be a daemonset thus present in every node, not sure with kepler!

@dipankardas011 The node selectors are needed in the kube-prometheus-stack helm release and also for the flux components.

https://github.com/fluxcd/flux2/issues/2252#issuecomment-1002790427 https://fluxcd.io/flux/installation/configuration/boostrap-customization/

even if we do that prometheus exporter will be a daemonset thus present in every node, not sure with kepler!

It's fine for the kepler DS to schedule pods on all nodes. This is so we can measure the overall energy consumption of the cluster.