aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 320 forks source link

[EKS] [request]: Using Karpenter in a multi-tenancy EKS Cluster - Need to assign Label to Worker nodes with namespace name #1966

Open mmasaaud opened 1 year ago

mmasaaud commented 1 year ago

Community Note

Tell us about your request What do you want us to build?

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? What outcome are you trying to achieve, ultimately, and why is it hard/impossible to do right now? What is the impact of not having this problem solved? The more details you can provide, the better we'll be able to understand and solve the problem.

Are you currently working around this issue? How are you currently solving this problem?

This is not really working on a large scale!! Additional context Anything else we should know?

This is blocking us on Certifying Karpenter within our environment. Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

dims commented 1 year ago

cc @ellistarn

ellistarn commented 1 year ago

This may be a better question for: https://github.com/aws/karpenter/issues, but happy to answer here,

We are looking to have Karpenter and EKS Scheduler to be more multi-tenant friendly by having at least Provisioner CRD has to have a namespace.

Provisioners can't be scoped to a namespace using Karpenter specific concepts. If Karpenter were to implement something like this, the kube scheduler would not enforce its decisions once the nodes come online. Instead, this must be implemented using Kubernetes native scheduling rules (i.e. taints, nodeaffinity)

As you've identified, once way to separate workloads with Kubernetes native scheduling rules is to use node selectors to ensure that different tenants run on different nodes. To repel workloads that don't specify nodeAffinity, you can use a taint and corresponding pod toleration to protect against this.

However, this places a burden on service teams to apply the correct node affinity for their workloads. If they don't they may not schedule, or worse, may schedule onto spare multi-tenant capacity. One way to solve this (as you've identified) is with a policy enforcement system like OPA or Kyverno. These systems use mutating webhooks to inject things like nodeAffinity into pods. However, you need this nodeAffinity to be dynamic, according to namespace.

Using OPA GK or Kybverno is not really efficient ion this situation today in our environment, enforcing multi-tenancy with the OPA brings a lot of problems on our side, basically we will have to think of a lot of integrations and policing. Therefore Karpenter will be narrowed to few or little use-cases/clients within our org.

As written, it's not clear to me what problems you're facing with existing policy manager. If I were to guess, I'd assume that it's not possible to easily express namespace-based nodeaffinity rules with existing policy managers.

We have a few options:

  1. Build this into upstream k8s (long road)
  2. Build this into Karpenter (e.g. https://github.com/aws/karpenter-core/issues/74)
  3. Build this as a new 3p project (e.g. github.com/kubernetes-sigs/tenancy-webhook)

I've played around with #3 in the past, and was able to hack up a solution in about ~100 lines of code. I'd be happy to provide some pointers if this is something you would be interested in taking on. #2 is possible, but I worry a little bit about expanding Karpenter's scope to include pod webhooks -- this type of policy enforcement seems like a better fit for existing policy enforcement solutions.

As a final note, you didn't mention this in your post, so it's possible you didn't run into this yet, but if you can get something to inject the scheduling rules, you can achieve pod isolation using a single Karpenter provisioner and the exists operator: https://karpenter.sh/v0.25.0/concepts/scheduling/#exists-operator.