aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.77k stars 954 forks source link

Consolidation scaling down new nodes #2468

Closed andrewhibbert closed 2 years ago

andrewhibbert commented 2 years ago

Version

Karpenter: v0.0.0

Kubernetes: v1.0.0

Expected Behavior

Karpenter should allow a configurable amount of time for a node to scale up and have running pods before scaling them down.

Actual Behavior

Karpenter is calculating a disruption score and ordering by time the node has come up looking at older ones first. But this will still pick newer empty nodes first

Steps to Reproduce the Problem

Resource Specs and Logs

tzneal commented 2 years ago

Can you provide an example scenario where Karpenter deletes an empty node it just launched, even though pods are pending?

andrewhibbert commented 2 years ago

I'll try, I was just seeing a huge amount of turnover when consolidation was switched on, seeing that newer nodes were being picked and then seeing extra ones provisioned. Seems like it needs to be more configurable to leave empty(ish) nodes in place for a period of time before removing them.

tzneal commented 2 years ago

It delays consolidation uses an automatically adjusted stabilization window, if there are pending pods/unready replicasets/etc. this window grows up to 5 minutes long. It also waits for a node to be fully initialized (ready, all extended resources registered, any startup taints removed ,etc.) before considering it. The net effect is that it shouldn't delete a newly launched empty node as there will be pending pods which will delay any consolidation decisions.

Do you have the logs for when this occurred?

andrewhibbert commented 2 years ago

Just switched it on

karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:07:43.064Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["gpt-normalization-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-gpt-normalization-service-async-t6nrz"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:07:43.064Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-104-69.eu-west-1.compute.internal/c5a.2xlarge   {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:07:43.092Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-104-69.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:08:20.631Z  INFO    controller.termination  Deleted node    {"commit": "b157d45", "node": "ip-10-138-104-69.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:12:51.492Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["gpt-normalization-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-gpt-normalization-service-async-p672c"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:12:51.497Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["vtw-fetcher-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-vtw-fetcher-service-sync-77cfb58hkbkr"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:12:51.500Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["vtw-fetcher-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-01/enrichment-services-np-01-vtw-fetcher-service-sync-5d4986c557tt"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:12:51.501Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-100-234.eu-west-1.compute.internal/m3.2xlarge   {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:12:51.533Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-100-234.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:13:33.795Z  INFO    controller.termination  Deleted node    {"commit": "b157d45", "node": "ip-10-138-100-234.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.825Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["gpt-normalization-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-gpt-normalization-service-sync-54hpbf"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.830Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["rx-chem-client-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-99/enrichment-services-np-99-rx-chem-client-service-sync-6c97dbkks"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.833Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["entity-filter-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-01/enrichment-services-np-01-entity-filter-service-sync-5fb8fdqlz2"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.837Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["rx-reg-client-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-01/enrichment-services-np-01-rx-reg-client-service-sync-795ddkmv4g"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.840Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["termite-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}  {"commit": "b157d45", "pod": "enrichment-services-np-50/enrichment-services-np-50-termite-service-async-55fdc685c9crnlw"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.841Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-106-46.eu-west-1.compute.internal/c5a.2xlarge   {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:17:59.866Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-106-46.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:18:42.128Z  INFO    controller.termination  Deleted node    {"commit": "b157d45", "node": "ip-10-138-106-46.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.679Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["gpt-normalization-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-99/enrichment-services-np-99-gpt-normalization-service-sync-7zq8m5"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.683Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["field-extraction-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}  {"commit": "b157d45", "pod": "enrichment-services-np-99/enrichment-services-np-99-field-extraction-service-sync-58nsbc6"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.687Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["ftd-feeder-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-ftd-feeder-service-async-5758b66qmx25"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.690Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["caas-document-index-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-50/enrichment-services-np-50-caas-document-index-service-sync2vhjn"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.693Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["classification-filter-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-classification-filter-service-sy5br4r"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.696Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["termite-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-termite-service-sync-684dcb49bc-99c4h"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.698Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-103-62.eu-west-1.compute.internal/c5a.2xlarge   {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:08.721Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-103-62.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:23:49.380Z  INFO    controller.termination  Deleted node    {"commit": "b157d45", "node": "ip-10-138-103-62.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.537Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["classification-filter-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-classification-filter-service-ast2tg5"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.541Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["entity-filter-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-01/enrichment-services-np-01-entity-filter-service-sync-5fb8f9b4zr"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.544Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["datalake-write-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-datalake-write-service-sync-7458m4rt9"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.546Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["datalake-write-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}   {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-datalake-write-service-async-847cxdkj"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.549Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["datalake-read-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-datalake-read-service-sync-cf9f6rph9c"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.552Z  DEBUG   controller.consolidation    Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["datalake-read-service-async"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}}    {"commit": "b157d45", "pod": "enrichment-services-np-00/enrichment-services-np-00-datalake-read-service-async-66f5k8bhr"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.553Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-104-127.eu-west-1.compute.internal/c5a.2xlarge  {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.587Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-104-127.eu-west-1.compute.internal"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.715Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.716Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.716Z  DEBUG   controller.provisioning 200 out of 542 instance types were excluded because they would breach provisioner limits    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.716Z  DEBUG   controller.provisioning 188 out of 509 instance types were excluded because they would breach provisioner limits    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.717Z  DEBUG   controller.provisioning 29 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.717Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.718Z  DEBUG   controller.provisioning 7 out of 542 instance types were excluded because they would breach provisioner limits  {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.718Z  DEBUG   controller.provisioning 55 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.719Z  DEBUG   controller.provisioning Relaxing soft constraints for pod since it previously failed to schedule, removing: spec.affinity.podAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution[0]={"weight":1,"podAffinityTerm":{"labelSelector":{"matchExpressions":[{"key":"component","operator":"In","values":["entity-filter-service-sync"]}]},"topologyKey":"failure-domain.beta.kubernetes.io/zone"}} {"commit": "b157d45", "pod": "enrichment-services-np-01/enrichment-services-np-01-entity-filter-service-sync-5fb8f9t6vd"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.720Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.720Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.721Z  DEBUG   controller.provisioning 200 out of 542 instance types were excluded because they would breach provisioner limits    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.721Z  DEBUG   controller.provisioning 188 out of 509 instance types were excluded because they would breach provisioner limits    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.721Z  DEBUG   controller.provisioning 29 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.722Z  DEBUG   controller.provisioning 14 out of 509 instance types were excluded because they would breach provisioner limits {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.722Z  DEBUG   controller.provisioning 7 out of 542 instance types were excluded because they would breach provisioner limits  {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.728Z  INFO    controller.provisioning Found 1 provisionable pod(s)    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.728Z  INFO    controller.provisioning Computed 1 new node(s) will fit 1 pod(s)    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.731Z  INFO    controller.provisioning Launching node with 1 pods requesting {"cpu":"991m","memory":"3876970752","pods":"10"} from types inf1.2xlarge, c3.2xlarge, r3.2xlarge, c5a.2xlarge, t3a.2xlarge and 91 other(s)    {"commit": "b157d45", "provisioner": "enrichment-service"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.999Z  DEBUG   controller.provisioning.cloudprovider   Created launch template, Karpenter-nonprod-shared1-1433741913800812631  {"commit": "b157d45", "provisioner": "enrichment-service"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:22.014Z  INFO    controller.provisioning.cloudprovider   Launched instance: i-0acdf6e3d9e03c063, hostname: ip-10-138-110-171.eu-west-1.compute.internal, type: c5a.2xlarge, zone: eu-west-1c, capacityType: spot {"commit": "b157d45", "provisioner": "enrichment-service"}

These aren't new nodes, although I suspect over time it may get more apparent. You can see however that it has cordoned one node:

karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.553Z  INFO    controller.consolidation    Consolidating via Delete, terminating 1 nodes ip-10-138-104-127.eu-west-1.compute.internal/c5a.2xlarge  {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:17.587Z  INFO    controller.termination  Cordoned node   {"commit": "b157d45", "node": "ip-10-138-104-127.eu-west-1.compute.internal"}

Then it has had to spin up another one exactly the same, 2 seconds later

karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.728Z  INFO    controller.provisioning Found 1 provisionable pod(s)    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.728Z  INFO    controller.provisioning Computed 1 new node(s) will fit 1 pod(s)    {"commit": "b157d45"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.731Z  INFO    controller.provisioning Launching node with 1 pods requesting {"cpu":"991m","memory":"3876970752","pods":"10"} from types inf1.2xlarge, c3.2xlarge, r3.2xlarge, c5a.2xlarge, t3a.2xlarge and 91 other(s)    {"commit": "b157d45", "provisioner": "enrichment-service"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:19.999Z  DEBUG   controller.provisioning.cloudprovider   Created launch template, Karpenter-nonprod-shared1-1433741913800812631  {"commit": "b157d45", "provisioner": "enrichment-service"}
karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:22.014Z  INFO    controller.provisioning.cloudprovider   Launched instance: i-0acdf6e3d9e03c063, hostname: ip-10-138-110-171.eu-west-1.compute.internal, type: c5a.2xlarge, zone: eu-west-1c, capacityType: spot {"commit": "b157d45", "provisioner": "enrichment-service"}

The other one is then deleted a short time later:

karpenter-7fd86b488d-b7wfs controller 2022-09-06T18:28:59.879Z  INFO    controller.termination  Deleted node    {"commit": "b157d45", "node": "ip-10-138-104-127.eu-west-1.compute.internal"}

I just think possibly it is going a bit quick and resulting in churn of nodes/pods and needs to be more configurable

tzneal commented 2 years ago

Were new pods being created at this time or were they solely from evictions? If it was just from an eviction, I would expect it to repeat since it replaced the node with an identical node. If it does, can you run kubectl describe pod pod-name for the pod that was evicted to determine why kube-scheduler thought it wouldn't schedule?

The only scenarios I can see causing this are: 1) A new pod was created 2) Our scheduling logic has a mistake and thinks the pod should fit on another node, but it can't.

I don't see option 2 as being likely, since if it occurred we wouldn't launch the second node. Karpenter would continue to think the pod would schedule on an existing node.

jBouyoud commented 2 years ago

I have the same kind of behavior :

Karpenter: v0.16.0 Kubernetes: v1.21.14

Log extract :

Date,Service,Message
"2022-09-07T16:00:25.485Z","""karpenter""","Deleted node"
"2022-09-07T16:00:25.290Z","""karpenter""","Cordoned node"
"2022-09-07T16:00:25.258Z","""karpenter""","Consolidating via Delete (empty node), terminating 1 nodes ip-10-36-113-215.eu-west-3.compute.internal/m5a.xlarge"
"2022-09-07T15:59:11.346Z","""karpenter""","Launched instance: i-050a61e60159e2a55, hostname: ip-10-36-113-215.eu-west-3.compute.internal, type: m5a.xlarge, zone: eu-west-3a, capacityType: on-demand"
"2022-09-07T15:59:09.206Z","""karpenter""","Launching node with 1 pods requesting {""cpu"":""1390m"",""memory"":""2098Mi"",""pods"":""7""} from types m5a.xlarge, m5.xlarge, m6i.xlarge, m5ad.xlarge, m5d.xlarge and 31 other(s)"
"2022-09-07T15:59:09.205Z","""karpenter""","Computed 1 new node(s) will fit 1 pod(s)"
"2022-09-07T15:59:09.205Z","""karpenter""","Found 1 provisionable pod(s)"

It seems like the " 1 provisionable pod " finally fit in actual node and the new node is empty. On the next consolidation cycle.

There are no eviction at the same time

tzneal commented 2 years ago

Is this reproducible? I think the most likely scenario for your log @jBouyoud is that another pod was terminated during the node launch which made space available so the new node was unnecessary. It's not possible to tell that from the log, but Karpenter only considers pods provisionable that kube-scheduler has marked as unschedulable. It must have given up on scheduling the pod before Karpenter would have created the node.

jBouyoud commented 2 years ago

Is this reproducible? I think the most likely scenario for your log @jBouyoud is that another pod was terminated during the node launch which made space available so the new node was unnecessary. It's not possible to tell that from the log, but Karpenter only considers pods provisionable that kube-scheduler has marked as unschedulable. It must have given up on scheduling the pod before Karpenter would have created the node.

@tzneal, Yes it happens several times a day on our working hours while people are deploying stuff (means multiple deployment rollout) on the cluster. This behavior is directly linked to deployment activity. Your scenario seems legit but I think the pod as been scheduled to another node (where another pod as been terminated). Does increasing the batchIdleDuration may help to reduce this noise ?

jBouyoud commented 2 years ago

Does karpenter take care about currently bootstraping node before scheduling a new one ?

When this behavior append pod are unschedulable due to a not-ready, node.

30s (x4 over 40s) default-scheduler 0/30 nodes are available: 1 Insufficient memory, 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) were unschedulable, 23 Insufficient cpu, 5 node(s) had taint {...}, that the pod didn't tolerate.

tzneal commented 2 years ago

@jBouyoud When replacing a node it will. In the log you provided, it was just deleting a node as it thought the pods could fit elsewhere.

How are the deployments being performed? Is the deployment being deleted and re-created, or just an image changed? I'm trying to reproduce this locally.

jBouyoud commented 2 years ago

How are the deployments being performed? Is the deployment being deleted and re-created, or just an image changed? I'm trying to reproduce this locally.

Never deleted and recreated. Mainly (99.5%) of image version change.

jBouyoud commented 2 years ago

Another exemple more complete for a deployment rollout :

  Warning  FailedScheduling  82s (x2 over 84s)  default-scheduler  0/29 nodes are available: 1 node(s) were unschedulable, 2 Insufficient memory, 22 Insufficient cpu, 5 node(s) had taint {xxxxxxxx}, that the pod didn't tolerate.
  Normal   Scheduled         22s                default-scheduler  Successfully assigned xxxx/XXXXXX-b64466dfd-f99h2 to ip-10-36-200-186.eu-west-3.compute.internal
  Normal   Pulled            21s                kubelet            Container image "hashicorp/vault:1.11.2@sha256:a60891bfb7b7a669d21544e0ad1b178e09a78174d4995e79fb11faf9a741e2ca" already present on machine
  Normal   Created           21s                kubelet            Created container vault-agent-init
  Normal   Nominate          21s                karpenter          Pod should schedule on ip-10-36-250-77.eu-west-3.compute.internal
  Normal   Started           21s                kubelet            Started container vault-agent-init

And in the same time in karpenter controller

{"level":"INFO","time":"2022-09-08T13:09:27.622Z","logger":"controller.provisioning","message":"Found 1 provisionable pod(s)","commit":"b157d45"}
{"level":"INFO","time":"2022-09-08T13:09:27.622Z","logger":"controller.provisioning","message":"Computed 1 new node(s) will fit 1 pod(s)","commit":"b157d45"}
{"level":"INFO","time":"2022-09-08T13:09:27.624Z","logger":"controller.provisioning","message":"Launching node with 1 pods requesting {\"cpu\":\"1390m\",\"memory\":\"2098Mi\",\"pods\":\"7\"} from types t3a.xlarge, c5a.xlarge, t3.xlarge, c5.xlarge, c6i.xlarge and 100 other(s)","commit":"b157d45","provisioner":"default"}
{"level":"INFO","time":"2022-09-08T13:09:29.643Z","logger":"controller.provisioning.cloudprovider","message":"Launched instance: i-08e80998cf3c3e208, hostname: ip-10-36-250-77.eu-west-3.compute.internal, type: t3a.xlarge, zone: eu-west-3c, capacityType: on-demand","commit":"b157d45","provisioner":"default"}

I hope this helps Feel free to ask if you need more information

tzneal commented 2 years ago

@jBouyoud What is the update strategy for one of these deployments? I'm still working on reproducing this locally.

jBouyoud commented 2 years ago

Almost all of our workloads use this : (the replicas number can be [1,5])

  progressDeadlineSeconds: 600
  terminationGracePeriodSeconds: 60
  replicas: 2
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

Our apps are mainly composed by 3 deployments. So when a version change we have at least 3 deployment rollout in the same time.

I don't think this is very important for Karpenter (and for this case) but we also have some initContainers (with different ressources spec from the main container).

tzneal commented 2 years ago

I'm still looking into this, but I can reproduce the extra node that you see that gets deleted. It's because of the surge in your update strategy, in your case if you have 5 pods and update the deployment image it will surge to 25 % or +2 pods, if they don't all fit in your cluster the new small node will be launched. Eventually the update finishes rolling through and the new node isn't needed.

jBouyoud commented 2 years ago

👌 Seems legit. Thanks

Probably, a way of tuning agressivity of consolidation process may helps to adjust this case (but not the topic here).

Thanks for the explanation.

tzneal commented 2 years ago

I'm going to close this then, but feel free to re-open or create a new issue if you see something that isn't explained by surge.

andrewhibbert commented 2 years ago

I think in my case it might be because it does this:

We end up with extra nodes becuase the topology is relaxed. Perhaps the topolgy should be relaxed before adding a new node?

tzneal commented 2 years ago

@andrewhibbert Sorry, I'm not following. I added some information inline regarding what Karpenter does when scheduling below:

andrewhibbert commented 2 years ago

This bit I mean - https://github.com/aws/karpenter/blob/main/pkg/controllers/provisioning/scheduling/scheduler.go#L123, removes a pod anti affinity. Anyway it likely is correctly done