aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.68k stars 934 forks source link

Incorrect maxPods setting in case of custom network configuration for AWS CNI #6353

Closed imunhatep closed 3 months ago

imunhatep commented 3 months ago

Description

Karpenter modifies user-data in LaunchTemplate by applying execution of bootstrap.sh with calculated arguments, at least for AL2 amiFamily.

MaxPods is calculated based on assumption of max-pods per node, I believe, by default maxPods setting is calculated here: https://github.com/aws/karpenter-provider-aws/blob/main/pkg/providers/instancetype/types.go#L435

Yet it ignores CNI custom network configuration, i.e. when nodes and pods reside in different subnets. This correctly calculated by AMI script: https://github.com/awslabs/amazon-eks-ami/blob/v20231106/files/max-pods-calculator.sh#L135

The issue is that, seems, there is no way to configure karpenter for proper maxPods calculation. Setting static maxPods, does not help in case of nodePools with different ec2 sizes.

Option of using prefixes by AWS CNI, is not always viable as it consumes more IPs and cannot be enabled in case of existing, in use subnets that have distributed reserved IPs, so CNI is unable to issue required amount of prefixes.

Observed Behavior: Pods unable to start on nodes provisioned by karpenter cause of CNI cannot issue IPs per pod, due to incorrect maxPods set by karpenter.

Expected Behavior: Karpenter set proper amount of maxPods depending on the ec2 type/size and CNI configuration.

Reproduction Steps (Please include YAML): Any EC2 created with nodePool will have incorrect maxPods set.

EKS with AWS CNI configuration deployed with:

env:
  AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: "true"
  AWS_VPC_K8S_CNI_EXTERNALSNAT: "true"
  ENABLE_PREFIX_DELEGATION: "false"

Versions:

jmdeal commented 3 months ago

You can set a number of reserved ENIs that will be excluded from the maxPods calculation either through an environment variable or CLI flag on the Karpenter controller (docs). Setting this to 1 should support your use case.

imunhatep commented 3 months ago

Ahh.. simple solution that have been overlooked. Will try out this. Thank you!