aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.77k stars 953 forks source link

karpenter not provisioning new node for a pending pod which was deemed as unschedulable by custom-scheduler on available nodes #7265

Open shaarifkhan opened 1 week ago

shaarifkhan commented 1 week ago

Description

Observed Behavior: we have deployed a custom scheduler in our EKS cluster that keeps in view of some extra paramaters while scheduling pod on a node, it would stop scheduling on a node if there are specific number of pods already in a RUNNING state on that node. so it may happen that a node has required capacity to schedule a pod but the custom-scheduler won't schedule on it.

eg: karpenter giving this log on a pending pod, Pod should schedule on: nodeclaim/shaarif-nodepool-fccsj, node/ip-10-0-253-116.ap-southeast-1.compute.internal this is the status of the pending pod

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-10-22T11:31:06Z"
    message: '0/33 nodes are available: 32 node(s) didn''t match Pod''s node affinity/selector.
      preemption: 0/33 nodes are available: 1 No preemption victims found for incoming
      pod, 32 Preemption is not helpful for scheduling.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

since the node has the capacity to schedule a pod but the custom scheduler filters out that node from scheduling as per the custom logic. as a consequence the pod stays in pending state for eternity. Expected Behavior: karpenter should spin up a node for a pod if it is marked unschedulable by any scheduler.

Reproduction Steps (Please include YAML): "Deploy a custom scheduler with a scheduler extender to filter out potential nodes, considering additional constraints."

Versions: karpenter version : 0.34.0 k8 release: 1.29.8 custom-kube-scheduler image: v1.25.12

mahafrain commented 1 week ago

we have also been facing the same issue

AffanNaeem commented 1 week ago

@shaarifkhan were you able to find a fix for this?

njtran commented 3 days ago

This seems normal to me, since we don't actually use the the kube-scheduler's code to schedule pods, and we implement our own code. This seems to be a natural gap in our scheduling is that we aren't aware of custom scheduler plugins, and thus could fail to actually schedule as the cluster expects.

I think the way to fix this would really just be allowing users to define custom scheduling plugins, and I'd be interested to know how many users want this.

njtran commented 3 days ago

Can you open this in github.com/kubernetes-sigs/karpenter? I'd like to track it as a feature request there.

shaarifkhan commented 3 days ago

so from this what I understood is, karpenter simulates the behaviour of default kube-scheduler to make its calculation and decision to find the potential nodes for scheduling, if it doesn't find any as per the simulation, it provisions a new suitable node for the workload.

and as for our case, since we are using custom scheduler extender to extend the default scheduler functionality, karpenter is unable to simulate that and making its decision based on the default one. Did I get it right?