aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
454 stars 205 forks source link

Can't install NTH with Karpenter #441

Closed jdwil closed 1 year ago

jdwil commented 2 years ago

Describe the bug

Not sure if this is a bug, or I'm just doing it wrong... but all the documentation I've read for Karpenter suggests using AWS Node Termination Handler to gracefully handle spot instance termination.

Expected Behavior

I'd expect an EKS cluster to be deployed with both Karpenter and AwsNodeTerminationHandler installed (along with the rest of my addons).

Current Behavior

.../node_modules/@aws-quickstart/eks-blueprints/lib/addons/aws-node-termination-handler/index.ts:72
    assert(asgCapacity && asgCapacity.length > 0, 'AWS Node Termination Handler is only supported for self-managed nodes');
    ^
AssertionError [ERR_ASSERTION]: AWS Node Termination Handler is only supported for self-managed nodes

...

    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

Reproduction Steps

import 'source-map-support/register';
import * as cdk from 'aws-cdk-lib';
import {aws_eks} from 'aws-cdk-lib';
import * as blueprints from '@aws-quickstart/eks-blueprints';
import * as process from "process";

const app = new cdk.App();

const karpenterAddonProps = {
    provisionerSpecs: {
        'node.kubernetes.io/instance-type': [
            'a1.medium',
            'a1.large',
            'a1.xlarge',
            'a1.2xlarge',
            'a1.4xlarge',
            'c5.large',
            'c5.xlarge',
            'c5.2xlarge',
            'c5.4xlarge',
            'd3.xlarge',
            'd3.2xlarge',
            'd3.4xlarge',
            'm5.large',
            'm5a.large',
            'm5.xlarge',
            'm5a.xlarge',
            'm5.2xlarge',
            'm5a.2xlarge',
            'm5.4xlarge',
            'm5a.4xlarge',
            't3.nano',
            't3.micro',
            't3.small',
            't3.medium',
            't3.large',
            't3.xlarge',
            't3.2xlarge'
        ],
        'topology.kubernetes.io/zone': ['us-west-2a'],
        'kubernetes.io/arch': ['amd64','arm64'],
        'karpenter.sh/capacity-type': ['spot','on-demand'],
    },
    subnetTags: {
        'aws:cloudformation:stack-name': 'EKS',
    },
    securityGroupTags: {
        'aws:eks:cluster-name': 'EKS',
    },
};

const addOns: Array<blueprints.ClusterAddOn> = [
    new blueprints.addons.CalicoOperatorAddOn(),
    new blueprints.addons.MetricsServerAddOn,
    new blueprints.addons.ContainerInsightsAddOn,
    new blueprints.addons.AwsLoadBalancerControllerAddOn(),
    new blueprints.addons.VpcCniAddOn(),
    new blueprints.addons.CoreDnsAddOn(),
    new blueprints.addons.KubeProxyAddOn(),
    new blueprints.addons.XrayAddOn(),
    new blueprints.addons.AwsNodeTerminationHandlerAddOn(),
    new blueprints.addons.KarpenterAddOn(karpenterAddonProps),
];

blueprints.EksBlueprint.builder()
    .version(aws_eks.KubernetesVersion.V1_21)
    .region(process.env.REGION)
    .account(process.env.ACCOUNT)
    .addOns(...addOns)
    .build(app, 'EKS');

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.32.0

EKS Blueprints Version

1.1.0

Node.js Version

v16.13.2

Environment details (OS name and version, etc.)

Debian sid

Other information

Thanks in advance for any help on this.

youngjeong46 commented 2 years ago

@jdwil thanks for submitting the issue. AWS Node Termination Handler is only available for self-managed nodegroups on EKS. From your setup, you are using a default nodegroup, which is a managed nodegroup.

To deploy EKS Blueprints with self-managed nodegroup, please take a look here.

vumdao commented 2 years ago

@youngjeong46 Why cdk-eks-blueprints does not support to use managed nodegroups to install NTH? if we don't use eks-blueprints and just launch EKS cluster and managed nodegroups, the NTH can be installed by using helm-chart

Furthermore, we cannot full control in The Auto Scaling Group Cluster Provider through cdk-eks-blueprints such as instance capacity type - spot-instance

vumdao commented 2 years ago

Same ticket https://github.com/aws-quickstart/cdk-eks-blueprints/issues/387

vumdao commented 2 years ago

@jdwil you would vote for this ticket https://github.com/aws-quickstart/cdk-eks-blueprints/issues/392 if you're installing NTH

jdwil commented 2 years ago

@youngjeong46 Thank you for the link. I will try switching to ASG as soon as I can. I still don't quite understand why this rule of not allowing NTH with MNG's is applicable when you're installing Karpenter. I have a managed node group, but karpenter scales the nodes and it does not use a node group. I'm very new to all this, so it's probably something obvious I'm missing.

shapirov103 commented 2 years ago

@jdwil it is a good point with Karpenter. Initially there was no restriction with respect to the NTH. It was a separate issue raised against NTH and MNG to add it. We will review it and the easiest path seems to be just relax the constraint.

javydekoning commented 2 years ago

@youngjeong46 can we relax this constraint? I'm running into the same issue with a cluster that only runs Fargate+Karpenter.

youngjeong46 commented 1 year ago

@javydekoning

This has been resolved - NTH is supported for Karpenter up to v0.19, when native interruption handling is provided. Closing the issue.