Open bradwatsonaws opened 3 months ago
By taking the resulting CloudFormation that comes from eksctl, I was able to deploy a nodegroup successfully that doesn't wrap the user data with a unique BOUNDARY. All unique values below were scrubbed.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'EKS Managed Nodes (SSH access: false)'
Mappings:
ServicePrincipalPartitionMap:
aws:
EC2: ec2.amazonaws.com
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-cn:
EC2: ec2.amazonaws.com.cn
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-iso:
EC2: ec2.c2s.ic.gov
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-iso-b:
EC2: ec2.sc2s.sgov.gov
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-us-gov:
EC2: ec2.amazonaws.com
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
Resources:
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
Encrypted: false
Iops: 3000
Throughput: 125
VolumeSize: 80
VolumeType: gp3
ImageId: ami-0b2e91234574a54c0
MetadataOptions:
HttpPutResponseHopLimit: 2
HttpTokens: required
SecurityGroupIds:
- !ImportValue 'eksctl-rhel-eks-cluster::ClusterSecurityGroupId'
TagSpecifications:
- ResourceType: instance
Tags:
- Key: Name
Value: rhel-eks-rhel-eks-cfn-Node
- Key: alpha.eksctl.io/nodegroup-type
Value: managed
- Key: nodegroup-name
Value: rhel-eks-cfn
- Key: alpha.eksctl.io/nodegroup-name
Value: rhel-eks-cfn
- ResourceType: volume
Tags:
- Key: Name
Value: rhel-eks-rhel-eks-cfn-Node
- Key: alpha.eksctl.io/nodegroup-type
Value: managed
- Key: nodegroup-name
Value: rhel-eks-cfn
- Key: alpha.eksctl.io/nodegroup-name
Value: rhel-eks-cfn
- ResourceType: network-interface
Tags:
- Key: Name
Value: rhel-eks-rhel-eks-cfn-Node
- Key: alpha.eksctl.io/nodegroup-type
Value: managed
- Key: nodegroup-name
Value: rhel-eks-cfn
- Key: alpha.eksctl.io/nodegroup-name
Value: rhel-eks-cfn
UserData:
Fn::Base64: !Sub |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: application/node.eks.aws
---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: rhel-eks
apiServerEndpoint: https://5B3FABCDE05F2D983E65079309B80C06.gr7.us-gov-east-1.eks.amazonaws.com
certificateAuthority: LS0tLS1CRULS0tLS0K
cidr: 10.100.0.0/16
--BOUNDARY
Content-Type: text/x-shellscript;
#!/bin/bash
set -ex
systemctl enable kubelet.service
systemctl disable nm-cloud-setup.timer
systemctl disable nm-cloud-setup.service
reboot
--BOUNDARY--
LaunchTemplateName: !Sub '${AWS::StackName}'
ManagedNodeGroup:
Type: AWS::EKS::Nodegroup
Properties:
ClusterName: rhel-eks
InstanceTypes:
- t3.medium
Labels:
alpha.eksctl.io/cluster-name: rhel-eks
alpha.eksctl.io/nodegroup-name: rhel-eks-cfn
role: worker
LaunchTemplate:
Id: !Ref 'LaunchTemplate'
NodeRole: !GetAtt 'NodeInstanceRole.Arn'
NodegroupName: rhel-eks-cfn
ScalingConfig:
DesiredSize: 2
MaxSize: 2
MinSize: 2
Subnets:
- subnet-0f034415c5b1237f0
- subnet-0bdba07340be1232f
- subnet-05c651fa62a123b2c
Tags:
alpha.eksctl.io/nodegroup-name: rhel-eks-cfn
alpha.eksctl.io/nodegroup-type: managed
nodegroup-name: rhel-eks-cfn
NodeInstanceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- !FindInMap
- ServicePrincipalPartitionMap
- !Ref 'AWS::Partition'
- EC2
Version: '2012-10-17'
ManagedPolicyArns:
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly'
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy'
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy'
- !Sub 'arn:${AWS::Partition}:iam::aws:policy/AmazonSSMManagedInstanceCore'
Path: /
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}/NodeInstanceRole'
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
What were you trying to accomplish?
I am trying to create a manage node group with my own multipart user data script as part of an overrideBootstrapCommand. This multipart user date script should run a mix of bash commands and also fulfill requirement for nodeadm node initialization.
What happened?
When eksctl creates the launch template and takes the user data script defined by the user, it appears to add it's own multipart boundaries, which prevent the user defined multipart user data script from working as expected. The result is that the node group is created with a launch template as per usual. However, the nodes are unable to join the cluster because nodeadm defaults to using imds for its configuration, and the eksctl created boundaries of the multipart user data script prevent nodeadm from finding a configuration in imds.
Example user defined multipart user data script passed into overrideBootstrapCommand:
Resulting user data script created by eksctl in the node group launch template:
As you can hopefully see, eksctl is generating it's own multipart script with it's own uniquely generated boundaries. This prevents the user defined boundaries from being respected.
How to reproduce it?
A zsh script with paramaters passed in that match the parameters defined at the top of this script:
Logs 2024-07-18 08:51:16 [ℹ] will use version 1.29 for new nodegroup(s) based on control plane version 2024-07-18 08:51:18 [ℹ] nodegroup "rhel-eks-nodeadmn-new" will use "ami-095c7b500f70da3d0" [AmazonLinux2/1.29] 2024-07-18 08:51:18 [ℹ] 2 existing nodegroup(s) (rhel-eks-github,rhel-eks-nodeadm) will be excluded 2024-07-18 08:51:18 [ℹ] 1 nodegroup (rhel-eks-nodeadmn-new) was included (based on the include/exclude rules) 2024-07-18 08:51:18 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "rhel-eks" 2024-07-18 08:51:19 [ℹ] 2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "rhel-eks-nodeadmn-new" } } } 2024-07-18 08:51:19 [ℹ] checking cluster stack for missing resources 2024-07-18 08:51:19 [ℹ] cluster stack has all required resources 2024-07-18 08:51:19 [ℹ] building managed nodegroup stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:51:20 [ℹ] deploying stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:51:20 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:51:50 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:52:42 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:54:03 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:55:08 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:56:09 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:56:59 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:58:12 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 08:59:49 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:00:26 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:01:30 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:02:30 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:03:54 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:05:07 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:06:54 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:08:02 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:09:12 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:10:27 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:12:07 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:13:38 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:14:53 [ℹ] waiting for CloudFormation stack "eksctl-rhel-eks-nodegroup-rhel-eks-nodeadmn-new" 2024-07-18 09:14:53 [ℹ] 1 error(s) occurred and nodegroups haven't been created properly, you may wish to check CloudFormation console 2024-07-18 09:14:53 [ℹ] to cleanup resources, run 'eksctl delete nodegroup --region=us-gov-east-1 --cluster=rhel-eks --name=' for each of the failed nodegroup
2024-07-18 09:14:53 [✖] waiter state transitioned to Failure
Error: failed to create nodegroups for cluster "rhel-eks"
Anything else we need to know? OS: MacOS Authentication: SSO through AWS CLI and Okta
Versions 0.187.0