aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.86k stars 967 forks source link

v0.33.2 Karpenter won't scale nodes on 1.28 EKS version: #5594

Closed VladFCarsDevops closed 7 months ago

VladFCarsDevops commented 9 months ago

Description

Observed Behavior: Karpenter pod logs:

Could not schedule pod, incompatible with nodepool "np-244-nodepool", daemonset overhead={"cpu":"780m","memory":"1120Mi","pods":"6"}, no instance type satisfied resources {"cpu":"6780m","memory":"11360Mi","pods":"7"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-cpu In [16 32 36 4 48 and 1 others], karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [np-244-nodepool], kubernetes.io/arch In [amd64], topology.kubernetes.io/zone In [us-east-1a us-east-1b us-east-1c] (no instance type has enough resources)

I tried to remove everything from the requirements to make NodePool flexible as much as possible, but I got the same error:

Could not schedule pod, incompatible with nodepool "np-244-nodepool", daemonset overhead={"cpu":"780m","memory":"1120Mi","pods":"6"}, no instance type satisfied resources (no instance type has enough resources)

Expected Behavior: Karpenter scales nodes dynamically regardless of the workload.

Reproduction Steps (Please include YAML):

`resource "helm_release" "karpenter" { namespace = "karpenter" create_namespace = true

name = "karpenter" repository = "oci://public.ecr.aws/karpenter" chart = "karpenter" version = "v0.33.2"

wait = true

set { name = "serviceAccount.annotations.eks\.amazonaws\.com/role-arn" value = var.karpenter_controller_arn }

set { name = "settings.clusterName" value = var.eks_name }

set { name = "settings.clusterEndpoint" value = var.cluster_endpoint }

set { name = "settings.defaultInstanceProfile" value = "np-244-KarpenterNodeInstanceProfile" }

set { name = "logLevel" value = "debug" } }

resource "kubectl_manifest" "nodepool" { yaml_body = <<-YAML apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: ${var.workspace}-nodepool spec: template: spec: requirements:

resource "kubectl_manifest" "ec2nodeclass" { yaml_body = <<-YAML apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: ${var.workspace}-node-class spec: amiFamily: "AL2" role: "${var.workspace}-karpenter-controller" subnetSelectorTerms:

IAM configurations:

data "aws_iam_policy_document" "karpenter_controller_assume_role_policy" { statement { actions = ["sts:AssumeRoleWithWebIdentity"] effect = "Allow"

condition {
  test     = "StringEquals"
  variable = "${replace(aws_iam_openid_connect_provider.this.url, "https://", "")}:sub"
  values   = ["system:serviceaccount:karpenter:karpenter"]
}

principals {
  identifiers = [aws_iam_openid_connect_provider.this.arn]
  type        = "Federated"
}

} }

resource "aws_iam_policy" "karpenter_policy" { name = "${var.workspace}-KarpenterPolicy" path = "/" description = "Policy for Karpenter"

policy = <<EOF { "Version": "2012-10-17", "Statement": [ { "Sid": "KarpenterInstanceProfileManagement", "Effect": "Allow", "Action": [ "iam:CreateInstanceProfile", "iam:AddRoleToInstanceProfile", "iam:RemoveRoleFromInstanceProfile", "iam:PassRole", "iam:GetInstanceProfile", "iam:TagInstanceProfile" ], "Resource": "" }, { "Sid": "KarpenterEC2Actions", "Effect": "Allow", "Action": [ "ec2:RunInstances", "ec2:DescribeSubnets", "ec2:DescribeSpotPriceHistory", "ec2:DescribeSecurityGroups", "ec2:DescribeLaunchTemplates", "ec2:DescribeInstances", "ec2:DescribeInstanceTypes", "ec2:DescribeInstanceTypeOfferings", "ec2:DescribeAvailabilityZones", "ec2:DescribeImages", "ec2:DeleteLaunchTemplate", "ec2:CreateTags", "ec2:CreateLaunchTemplate", "ec2:CreateFleet", "ssm:GetParameter", "pricing:GetProducts" ], "Resource": "" }, { "Sid": "ConditionalEC2Termination", "Effect": "Allow", "Action": "ec2:TerminateInstances", "Resource": "", "Condition": { "StringLike": { "ec2:ResourceTag/Name": "karpenter*" } } } ] } EOF }

resource "aws_iam_role" "karpenter_controller" { assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role_policy.json name = "${var.workspace}-karpenter-controller" }

resource "aws_iam_policy" "karpenter_controller" { policy = aws_iam_policy.karpenter_policy.policy name = "${var.workspace}-karpenter-controller" }

resource "aws_iam_role_policy_attachment" "karpenter_controller_attach" { role = aws_iam_role.karpenter_controller.name policy_arn = aws_iam_policy.karpenter_controller.arn }

resource "aws_iam_instance_profile" "karpenter" { name = "${var.workspace}-KarpenterNodeInstanceProfile" role = aws_iam_role.kubernetes-worker-role.name }`

Versions:

jonathan-innis commented 9 months ago

Can you share the status from your EC2NodeClass? Typically, you will see this error when Karpenter isn't able to discover your subnets and you don't have any zones that Karpenter can leverage for scheduling pods to instance types.

VladFCarsDevops commented 9 months ago

Hi @jonathan-innis Thanks for responding! I figured out the issue. The core issue was that the initial error message:

Could not schedule pod, incompatible with nodepool "np-244-nodepool", daemonset overhead={"cpu":"780m","memory":"1120Mi","pods":"6"}, no instance type satisfied resources

was misleading! The problem was that the EC2NodeClass manifest was referring to a role:

role: "${var.workspace}-karpenter-controller" rather than the InstanceProfile. In the official documentation: https://karpenter.sh/v0.33/concepts/nodeclasses/

It is mentioned that the InstanceProfile is optional and also mentions that you can specify either role or InstanceProfile. In my case, it did not work with role, but worked with InstanceProfile, thus the error message did not provide me any useful information for debugging and instead led me in the other direction... It was purely resolved by experimenting.. I guess in this case the expected log output should mention something about the necessity of having InstanceProfile and maybe documentation should be updated?

VladFCarsDevops commented 9 months ago

@jonathan-innis Another issue that I noticed, when I create NodePool and EC2NodeClass from the plain K8s yml manifests it works as expected. However, if I create NodePool and EC2NodeClass from the kubernetes_resource terraform resource, it will result in the same error even though the configuration is identical. Works with kubectl_manifest though.

engedaam commented 9 months ago

In my case, it did not work with role, but worked with InstanceProfile

@VladFCarsDevops are you using a private cluster?

However, if I create NodePool and EC2NodeClass from the kubernetes_resource terraform resource, it will result in the same error even though the configuration is identical

Is this when you are using role for the EC2NodeClass?

VladFCarsDevops commented 9 months ago

@engedaam My EKS cluster endpoints are both Public and Private.

Yes! When I switched to using InstanceProfile instead of role in EC2NodeClass it fixed that misleading error. The official documentation gives you the option to set one of them, but not both as it results in error when applying. My role had almost full access permissions and was correctly attached, but this resulted in the errors I posted above until I switched to InstanceProfile

engedaam commented 9 months ago

@VladFCarsDevops would you be will make to a PR for the documentation update?

VladFCarsDevops commented 9 months ago

@engedaam Sure, can you point me to the right location?

engedaam commented 9 months ago

It would be here https://github.com/aws/karpenter-provider-aws/tree/main/website/content/en. You will need to make the same changes to v0.32, v0.33, and v0.34

jonathan-innis commented 9 months ago

was misleading! The problem was that the EC2NodeClass manifest was referring to a role

@VladFCarsDevops This is surprising to me. From what I know about the current state of the code, we shouldn't return back a different response during scheduling when using an instance profile vs. using a role. It's a bit hard to parse the terraform manifests that you pasted above (also, unfortunately none of the maintainers on the karpenter team are TF experts). Do you have direct access to the cluster and if you do, could you post the YAML version of the EC2NodeClass and NodePool when you have the instance profile vs. when you have the role?

Also, as for surfacing this information better. We're currently talking about how we can improve observability for Karpenter using status conditions across all of our resources. This is talked about here: https://github.com/kubernetes-sigs/karpenter/issues/493. I'd imagine that surfacing a condition directly like InstanceProfileReady would have helped debug here.

VladFCarsDevops commented 9 months ago

was misleading! The problem was that the EC2NodeClass manifest was referring to a role

@VladFCarsDevops This is surprising to me. From what I know about the current state of the code, we shouldn't return back a different response during scheduling when using an instance profile vs. using a role. It's a bit hard to parse the terraform manifests that you pasted above (also, unfortunately, none of the maintainers on the karpenter team are TF experts). Do you have direct access to the cluster and if you do, could you post the YAML version of the EC2NodeClass and NodePool when you have the instance profile vs. when you have the role?

Also, as for surfacing this information better. We're currently talking about how we can improve observability for Karpenter using status conditions across all of our resources. This is talked about here: kubernetes-sigs/karpenter#493. I'd imagine that surfacing a condition directly like InstanceProfileReady would have helped debug here.

@jonathan-innis Oh I tried creating EC2NodeClass and NodePool with plain yamls and had the same errors up until I changed from role to an InstanceProfile, a friend of mine working at another company faced the same issue. I think updating the instructions in the docs, will save a ton of time for people debugging a wrong log output when it has nothing to do with resources.

jonathan-innis commented 9 months ago

Oh I tried creating EC2NodeClass and NodePool with plain yamls and had the same errors up until I changed from role to an InstanceProfile

I don't disagree if this is really what is happening, but what I am trying to say is that these issues seem potentially unrelated to me. From looking over the code and reasoning about where we evaluate the instance profile and the role when it comes to making scheduling decisions, we don't have them affect scheduling decisions at all, which is why I'm thinking that it's odd that you are seeing "Could not schedule pod, incompatible with nodepool" and pointing back to the fact that you were using a role vs. an instance profile as the reason. Do you know if the EC2NodeClass that you were referencing was properly resolving the subnets or security groups that you were specifying by checking the status?

One common problem that we see is that subnets don't get resolved; therefore, the instance types aren't able to produce zones and so you will see the error that you pasted over here when you are scheduling pods

ahoehma commented 7 months ago

@VladFCarsDevops can you please share your final ec2nc and where do you get the right instance-profile from?

VladFCarsDevops commented 7 months ago

@ahoehma You have to create Instance profile separately , give permissions and reference it in you ec2nc

resource "kubectl_manifest" "nodepool" {
  yaml_body = <<-YAML
    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        spec:
          requirements:
            - key: kubernetes.io/arch
              operator: In
              values: ["amd64"]
            - key: kubernetes.io/os
              operator: In
              values: ["linux"]
            - key: karpenter.sh/capacity-type
              operator: In
              values: ["on-demand"]
            - key: karpenter.k8s.aws/instance-category
              operator: In
              values: ["c", "m", "r"]
            - key: karpenter.k8s.aws/instance-generation
              operator: Gt
              values: ["2"]
          nodeClassRef:
            name: default
      limits:
        cpu: 1000
      disruption:
        consolidationPolicy: WhenUnderutilized
        expireAfter: 24h
  YAML

  depends_on = [
    helm_release.karpenter
  ]
}

resource "kubectl_manifest" "ec2nodeclass" {
  yaml_body = <<-YAML
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    metadata:
      name: default
    spec:
      amiFamily: AL2
      instanceProfile: ${var.workspace}-KarpenterNodeInstanceProfile
      subnetSelectorTerms:
        - tags:
            karpenter.sh/discovery: "${var.workspace}"
      securityGroupSelectorTerms:
        - tags:
            karpenter.sh/discovery: "${var.workspace}"
      amiSelectorTerms:
        - id: "${var.default_ami_id}"
      blockDeviceMappings:
        - deviceName: ${var.ebs_device_name}
          ebs:
            volumeSize: ${var.ebs_volume_size}
            volumeType: ${var.ebs_volume_type}
            encrypted: true
            deleteOnTermination: true
      tags: ${jsonencode(merge(data.aws_default_tags.current.tags, {"Name" = "${var.workspace}-Karpenter-autoscaled-node"}))}    
  YAML
  depends_on = [
    helm_release.karpenter
  ]
}
VladFCarsDevops commented 7 months ago

Do you know if the EC2NodeClass that you were referencing was properly resolving the subnets or security groups that you were specifying by checking the status?

Yes, SG and Subnets were properly setup