aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
448 stars 198 forks source link

add ipv6 support to karpenter&vpc cni addons (mostly IAM permissions issue) #1079

Open neoakris opened 6 days ago

neoakris commented 6 days ago

Describe the feature

I'm using a modified fork of eks blueprints to allow deployment of an ipv6 cluster.
My fork is heavily modified, and uses a lot of hacks to get things to work, so I don't have any code worth contributing upstream.

That said I've been able to test an EKS Blueprints based ipv6 cluster deployed in a dual stack vpc and I can happily inform you that most things work with no significant changes needed.

The only thing that didn't really work out of the box was karpenter on an ipv6 cluster. It was weird because the Managed Node Groups worked fine, just not karpenter nodes. I did a compare and contrast and found 1 was missing a permission that the other had. Once I added the permissions it didn't work until I rebooted the AWS CNI pods, but after updating IAM permission and rebooting AWS CNI pods. Karpenter node's scheduled ipv6 pods correctly just like managed node group nodes.

If we bake the permissions into the IaC then it should work from the start.

Use Case

This is needed for eks clusters in ipv6 deployed to a pre-existing dualstack vpc.

Without this feature request the managed node groups of an ipv6 eks cluster work, but karpenter added nodes can't provision pods due to a aws-cni error due to lack of IAM permissions.

Proposed Solution

I discovered the role attached to the managed node groups had the following inline IAM policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:AssignIpv6Addresses",
                "ec2:UnassignIpv6Addresses"
            ],
            "Resource": "arn:aws:ec2:*:*:network-interface/*",
            "Effect": "Allow"
        }
    ]
}

I propose:

  1. If anyone has connections to AWS staff ask them to add these permissions to the AWS managed policy named "AmazonEKS_CNI_Policy" (That will take a while to implement I'm sure due to bureaucracy, so we should do the following in the mean time.)
  2. add this inline permission to karpenter generated node role
  3. add this inline permission to vpc-cni addon when serviceAccountPolicies is specified as a VpcCni property.

Other Information

I also manually patched karpenter's EC2NodeClass. I'm not sure if this was also needed to make it work, but I did notice the karpenter addon yaml generation doesn't allow this to be customized and maybe it should.

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default-ec2nodeclass
spec:
  amiFamily: AL2 
  blockDeviceMappings: []
  detailedMonitoring: false
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: enabled    #<-- CHANGED FROM disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  role: dev2-eks-dev2ekskarpenternoderoleDEC59317-OZFd5l8IdB5e

Acknowledgements

CDK version used

2.133.0 (build dcc1e75)

EKS Blueprints Version

1.15.1

Node.js Version

v20.17.0

Environment details (OS name and version, etc.)

Mac OS Sonoma 14.6.1

neoakris commented 6 days ago

Note It seems the VPC CNI can't support this because that interface only allows AWS Managed AddOns and no AWS Managed Addons (with reasonably restrictive permissions), support IPv6 VPC.

The VPC CNI Blueprints Addon would need to be modified to be more flexible and potentially add Inline IAM policy in addition to AWS Managed.

Likewise I found karpenter addon doesn't seem to allow specifying a custom nodeRole, so it can't be fixed with config.

Both cases need a code change to allow karpenter to support IPv6 clusters.

neoakris commented 6 days ago

wait actually this might work for karpenter, I missed that when ctrl + f for policy and role, I'll give it another try with InstanceProfile

https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/lib/addons/karpenter/index.ts#L118

neoakris commented 2 days ago

Note: Karpenter Addon's Implementation details make instance profiles not work (there's an issue ticket for it.)

https://github.com/aws-quickstart/cdk-eks-blueprints/issues/893#issuecomment-2162196673 ^-- yubingjiaocn lists a usable workaround solution

neoakris commented 2 days ago

Here's a workaround that worked for me:
Note there's some redundancy in here (reliability in depth) (VPI CNI pod role and Karpenter Node Role both are given rights).

const ipv6_support_policy_statement = new iam.PolicyStatement({
    actions: ['ec2:AssignIpv6Addresses','ec2:UnassignIpv6Addresses'],
    resources: ['arn:aws:ec2:*:*:network-interface/*'],
})
const karpenter_node_role = this.stack.node.findChild(this.config.id).node.tryFindChild('karpenter-node-role') as iam.Role;
const aws_vpc_cni_pod_role = this.stack.node.findChild(this.config.id).node.tryFindChild('aws-node-sa')?.node.tryFindChild('Role') as iam.Role;
karpenter_node_role.addToPolicy(ipv6_support_policy_statement);
aws_vpc_cni_pod_role.addToPolicy(ipv6_support_policy_statement);