aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.22k stars 321 forks source link

[EKS] [Feature]: Allow Kube Scheduler Customization #1468

Open Kausheel opened 3 years ago

Kausheel commented 3 years ago

Community Note

Tell us about your request What do you want us to build?

It would be great if EKS allowed users to configure the Kube Scheduler parameters. This is a Control Plane component, so users don't have access to this by default. Exposing the Kube Scheduler configuration either via AWS APIs or via the KubeSchedulerConfiguration resource type would be a significant advantage for EKS users.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Use cases for this might include switching from equal Pod distribution to a binpacking approach, which optimizes cost effectiveness. There are many other Scheduler parameters which users might want to tweak themselves.

Are you currently working around this issue? Implementing custom Kube Schedulers. This is not ideal, since it requires operational overhead in maintaining and updating the custom Kube Scheduler. It may also require using tools like OPA to insert custom schedulerName fields into the target Pods, which is yet another burden on the user.

Thanks!

ashishapy commented 3 years ago

Another use-case is to define Cluster-level default constraints for PodTopologySpread in scheduler. As per doc https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#cluster-level-default-constraints

AWS should make it as default behaviour in EKS cluster.

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List
stijndehaes commented 1 year ago

I would love to use this for enabling bin packing like explained here: https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/

sherifabdlnaby commented 1 year ago

Upvote.

Trying to use EKS and achieve bin packing is hard without changing Scheduler Behavior to favor MostAllocated.

logyball commented 1 year ago

Note that this feature is supported to some extent in Azure and is supported for the use case of Scheduler Scoring Strategy: MostAllocated in GKE by using the autoscaling profile (note this is an assumption on my part, GKE does not explicitly document what this setting does under the hood) . Adding this ability would help EKS users gain parity in that sense.

stijndehaes commented 1 year ago

I would be fine with having a setting like GKE has, this would solve my use case. It probably does not solve every use case out there, but I can understand if the AWS EKS team feels reluctant to allow changing the whole configuration.

boblee0717 commented 1 year ago

Imagine this, if this feature can be opened for all EKS users, that would save a lot of time for them. Let's assume it will take one week per person to workaround this via custom kube-scheduler, if there are 1000 users need this, it will cost 7000 days, that would be a whole life of one person.

alex-berger commented 1 year ago

With Kubernetes v1.24 the DefaultPodTopologySpread feagture graduated to GA https://github.com/kubernetes/kubernetes/pull/108278. Without this we have not way to use (resp. configure) it on EKS clusters.

AnhQKatalon commented 1 year ago

Same here. We need this feature to enable resource bin packing for cost saving https://kubernetes.io/docs/concepts/scheduling-eviction/resource-bin-packing/

Art3mK commented 1 year ago

@AnhQKatalon, run scheduler yourself with needed settings + patch pods to use that scheduler with kyverno for example :) Could be done in couple hours.

AnhQKatalon commented 1 year ago

@AnhQKatalon, run scheduler yourself with needed settings + patch pods to use that scheduler with kyverno for example :) Could be done in couple hours.

Yeah, I am doing the workaround this way. Appreciate your help. But it should be great if EKS supports changing the scheduler configuration officially.

babinos87 commented 1 year ago

As others mentioned, this is required to set default pod topology constraints on the cluster, as per: https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#cluster-level-default-constraints. There would be other uses cases, I am sure of it.

There are workarounds, of course, but this seems like a core thing to do, in order to make the life of EKS users easier. I thing this is a MUST.

fernandesnikhil commented 1 year ago

This would be very helpful for the same reasons mentioned by other above:

The suggestion of rolling your own Scheduler is not appealing because EKS might have bolted on their own tweaks/modifications to get the scheduler to work right in AWS and then we'd loose all of that. And then there's maintaining it. I get that modifying the EKS blessed set of configuration can lead to instability - but if I want to modify just a few settings I should be allowed to do that with the understanding it could break scheduling on my cluster. Upstream k8s allows it and it's useful.

subhranil05 commented 1 year ago

If not possible to add customization in kube-scheduler, can we think about this feature like GKE, node groups will have option to scale with the mostAllocated like strategy like GKE have autoscale profile optimize-utilization ?

sherifabdlnaby commented 1 year ago

@subhranil05 This is not an alternative solution. Scaling Node Groups can only achieve bin-packing during the event of scaling up. Kube Scheduler customization is necessary for in-place, proactive bin-packing.

m00lecule commented 1 year ago

Can somebody take a look and consider including this issue to kanban board? It seems that demand is still valid in 2023 as issue is active for more than 2 years. Of course we we can self-manage additional kube-scheduler but it's counter intuitive to subscribe for aws-managed EKS controlplane with self-managed controlplane components (additional kube-scheduler).

CC @tabern @mikestef9

paulchambers commented 1 year ago

This would be very useful for my EKS clusters. I want to be able to set sensible defaults without having to run my own scheduler.

cskinfill commented 1 year ago

I would love to see this as well too support bin packing at scheduling.

sherifabdlnaby commented 1 year ago

Do it for the environment folks!

Legion2 commented 11 months ago

I want to use bin packing with karpenter for job workloads. So karpenter can scale down empty nodes after a scale up. Instead of spreading the pods across all nearly empty nodes they should be packed on some full nodes, to enable karpenter removing empty nodes after the last job running on it completed.

onelapahead commented 11 months ago

Assuming AWS may not prioritize this for awhile at the current rate, I think an example deployment of a custom scheduler with MostAllocated enabled for binpacking would benefit everyone here (as suggested in https://github.com/aws/containers-roadmap/issues/1468#issuecomment-1645021158) - despite the burden it puts on 1) cluster admins to maintain control plane infra in-step with EKS versions, 2) Pod creators to ensure the custom scheduler is used. A Kyverno / Gatekeeper / custom webhooks potentially helping with the latter.

https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/

Is a starting point, but if anyone has manifest samples that have been tested for a binpack configuration everyone wants that'd be appreciated. If I get to this at some point will share.

In some clusters, I've seen something like this provided:

apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /var/lib/kube-scheduler/kubeconfig
profiles:
- schedulerName: default-scheduler
  pluginConfig:
    - args:
        scoringStrategy:
          type: MostAllocated
      name: NodeResourcesFit
  plugins:
    score:
      disabled:
      - name: "NodeResourcesBalancedAllocation"
      enabled:
      - name: "NodeResourcesFit"
        weight: 5
vinay92-ch commented 9 months ago

We ran into this this same issue and had to setup a custom scheduler to implement bin-packing. It's the same kube-scheduler image with a MostAllocated scoring policy as suggested above. Blog has more details about how we dealt with overprovisioning and system workloads and rollout to all pods. This section has the specific scheduler config.

We were able to achieve this in GCP by using the optimize-utilization setting in GKE, but for Azure AKS, we still have to use this secondary scheduler with custom scoring policy.

MattLJoslin commented 8 months ago

How is this API not supported yet? Is there any plan to support this soon? It's part of the standard Kubernetes service but there's no way to use on EKS? This really doesn't make EKS very usable in our case. All of the major packages are assuming that the standard APIs are available.

eliran-zada-zesty commented 7 months ago

Same as @MattLJoslin said... we really need it as well

stevehipwell commented 6 months ago

I think being able to run the scheduler in MostAllocated mode would make the Karpenter use case even more compelling.

stevehipwell commented 5 months ago

https://www.cncf.io/blog/2024/06/03/tackling-gpu-underutilization-in-kubernetes-runtimes/

jukie commented 1 month ago

Any updates on this?

nikimanoledaki commented 1 month ago

+1 for being able to add a pluginConfig about PodTopologySpread as well as a MostAllocated scoring policy.

Do it for the environment folks!

And this!

woehrl01 commented 1 month ago

Hint in the meantime: you can use the following AWS managed image to provision the scheduler yourself, without the need to self-manage the image: https://gallery.ecr.aws/eks-distro/kubernetes/kube-scheduler

jukie commented 1 month ago

@woehrl01 That's a viable workaround but then users have to manage the scheduler component themselves as well as update every workload to target it in the pod spec. As a managed Kubernetes service it'd be ideal if these options were exposed as configuration to the user instead.

woehrl01 commented 1 month ago

@jukie I'm not arguing against that this would be a nice addition. I just wanted to mention a solution which won't require you to wait more than 3 years. Just wanted to share this to newcomers which maybe are having troubles with self maintaining this image, etc.

jukie commented 1 month ago

Totally agree, and thanks for sharing!