Open adriantaut opened 3 years ago
OK, so there are two questions here:
Why are the tags being propagated to the whole subtree and how to avoid this? Since you created #16742 and have already done some research (thanks for that!), I think this conversation can keep going on there.
Why did the update fail when the tag changed? This error comes from CloudFormation (and ultimately from the EKS control plane), so I don't have much insight here. I can create a ticket with the EKS team internally, but I would like to be able to reproduce the bug, before doing that. Could you share a short and self-contained example that demonstrates the issue?
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.
Thanks a lot for your reply, @otaviomacedo !
I agree, it makes more sense.
Sure thing, you can find a snippet below:
import { InstanceType, IVpc, SubnetType, SecurityGroup, Peer, Port, CfnLaunchTemplate, ISecurityGroup } from '@aws-cdk/aws-ec2';
import { Cluster, KubernetesVersion, EndpointAccess, Nodegroup } from '@aws-cdk/aws-eks';
import * as iam from '@aws-cdk/aws-iam';
export interface EksClusterProps {
readonly vpc?: IVpc,
readonly version?: SUPPORTED_KUBERNETES_VERSION,
readonly nodeGroup: EksClusterNodeGroupProps,
}
type INSTANCE_METADATA_SERVICE = 'enabled' | 'disabled';
type SUPPORTED_KUBERNETES_VERSION = '1.18' | '1.19';
export interface EksClusterNodeGroupProps {
/**
* PPB EKS Optimized AMI ID - it needs to match with the Kubernetes Version of the Cluster
*/
readonly imageId: string,
/**
* List of LDAP groups to grant access in the EKS NodeGroup instances
*/
readonly ldapGroups?: string[],
/**
* Whether to enable/disable the access to Instance Metadata Service; having this enabled makes
* all the pods to automatically inherit the nodeGroup IAM Role permissions, so if there is no
* specific usecase, we strongly recommend having the IMDS disabled and use IAM Roles for k8s Service Accounts.
*
* @default 'disabled'
*/
readonly instanceMetadataService?: INSTANCE_METADATA_SERVICE;
/**
* Instace Types of the EC2 instances in the NodeGroup
*/
readonly instanceType?: InstanceType,
/**
* Minimum amount of EC2 instances in NodeGroup
*/
readonly minSize?: number,
/**
* Maximum amount of EC2 instances in NodeGroup
*/
readonly maxSize?: number,
}
export class EksCluster extends cdk.Construct {
readonly account: string;
/**
* EKS Cluster instance
*/
readonly cluster: Cluster;
/**
* EKS Cluster name
*/
readonly clusterName: string;
/**
* EKS Cluster NodeGroup
*/
readonly nodeGroup: Nodegroup;
/**
* EKS Cluster VPC
*/
readonly vpc: IVpc;
/**
* Security group that is applied to both control plane and k8s nodes
*/
private clusterSG: ISecurityGroup | undefined;
constructor(scope: cdk.Construct, id: string, props: EksClusterProps) {
super(scope, id);
const stack = cdk.Stack.of(this);
this.account = stack.account;
this.clusterName = `cluster`;
// props.vpc precedes props.vpcName which precedes the `default` vpc
this.vpc = ec2.Vpc.fromLookup(this, 'vpc', {
vpcId: 'vpc-id'
});
// instantiate the EKS Cluster
this.cluster = this.createCluster(props, this.vpc);
// add the NodeGroup according to user specifications
this.nodeGroup = this.addNodeGroup(props.nodeGroup);
}
/**
* Creates the EKS Cluster and provide the access mechanisms through IAM roles and SG rules.
*
* @param props EksClusterProps interface options
* @param vpc the VPC to deploy into
*/
private createCluster(props: EksClusterProps, vpc: IVpc) {
// role with systems:master permissions, used to run kubectl commands
// ToDo: find a way of restricting the usage of this role from Pipelines Deploying k8s applications
const mastersRoleName = `K8sMastersRole`;
const mastersRole = new iam.Role(this, 'Masters-Role', {
roleName: mastersRoleName,
assumedBy: new iam.AccountPrincipal(this.account),
});
// restrict the Control Plane SG; otherwise outbound is widely opened
const controlPlaneSG = new SecurityGroup(this, 'ControlPlaneSecurityGroup', {
vpc,
securityGroupName: `${this.clusterName}-Control-Plane-SG`,
description: 'EKS Control Plane Security Group - restricted',
allowAllOutbound: false,
});
// cluster creation
const cluster = new Cluster(this, 'Cluster', {
clusterName: this.clusterName,
version: KubernetesVersion.of(props.version ?? '1.19'),
vpcSubnets: [{ subnetType: SubnetType.PRIVATE }],
endpointAccess: EndpointAccess.PRIVATE, // no access outside of your VPC,
vpc,
mastersRole,
securityGroup: controlPlaneSG,
});
// Security group applied to control plane, nodes and all pods that do not match explicit SecurityGroupPolicy
this.clusterSG = SecurityGroup.fromSecurityGroupId(this, 'ClusterSecurityGroup', cluster.clusterSecurityGroupId, {
allowAllOutbound: false,
});
// Allow communication between nodes and control plane (see: https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html)
controlPlaneSG.connections.allowFrom(this.clusterSG, Port.tcp(443), 'Allow nodes to access apiserver');
this.clusterSG.connections.allowFrom(controlPlaneSG, Port.tcpRange(1025, 65535), 'Allow management traffic from control plane to nodes');
// Allow nodes accessing "EKS Lambdas" TODO what is EKS lambdas and why so open
this.clusterSG.addEgressRule(Peer.anyIpv4(), Port.tcp(443), 'Needed by EKS Lambdas');
// Allow nodes accessing Amazon2 Linux YUM repos TODO why so open
this.clusterSG.addEgressRule(Peer.anyIpv4(), Port.tcp(80), 'Amazon2 Yum Repositories');
// Allow inter-node communication for core components like autoscaler, external-dns, alb-controller that are deployed in a single worker node
this.clusterSG.addEgressRule(this.clusterSG, Port.allTraffic(), 'Internode Communication');
return cluster;
}
/**
* Deploys the Compute part of the EKS Cluster through a single NodeGroup for the moment.
* The NodeGroup needs the PPB EKS Optimized AMI ID, which needs to be set in the dependecies file
* of the TLA repository, according with the EksClusterProps.version passed by user.
*
* @param props
*/
private addNodeGroup(props: EksClusterNodeGroupProps): Nodegroup {
// create a launch template, which will be responsible for bootstrapping the worker nodes
const launchTemplate = new CfnLaunchTemplate(this, 'NodeGroup-LaunchTemplate', {
launchTemplateData: {
metadataOptions: {
httpEndpoint: 'enabled',
httpTokens: 'required', // enforce IMDSv2
httpPutResponseHopLimit: ((props.instanceMetadataService ?? 'disabled') == 'disabled') ? 1 : 2, // hopLimit = 1 disables IMDSv2 too
},
userData: cdk.Fn.base64(this.nodeGroupUserData(props.ldapGroups)),
imageId: props.imageId,
instanceType: props?.instanceType?.toString(),
tagSpecifications: [
{
resourceType: 'instance',
tags: [{ key: 'aws-inspector', value: 'true' }],
},
],
},
});
const nodeGroup = this.cluster.addNodegroupCapacity('NodeGroup', {
nodegroupName: `${this.clusterName}-NodeGroup`,
minSize: props?.minSize,
maxSize: props?.maxSize,
subnets: { subnetType: SubnetType.PRIVATE },
launchTemplateSpec: {
id: launchTemplate.ref,
version: launchTemplate.attrLatestVersionNumber,
},
});
return nodeGroup;
}
}
and its usage:
export class AppStack extends cdk.Stack {
constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const cluster = new EksCluster(this, 'awsrefeksCluster', {
...props,
version: '1.19',
nodeGroup: {
imageId: image.getImage(this).imageId,
minSize: 1,
maxSize: 2,
instanceType: InstanceType.of(InstanceClass.C5, InstanceSize.LARGE),
}
});
}
}
const app = new cdk.App();
cdk.Tags.of(app).add('CiCd:CdkLibPpbVersion', '7.40.0');
new AppStack(app, 'app-stack');
now, if only that CiCd:CdkLibPpbversion
Tag applied at Cdk App level gets changed from 7.40.0 -> 7.41.0
it will trigger a change both in EKS NodeGroup
and in the EC2 LaunchTemplate
as you can see in the initial message of this issue. Nothing gets changed except the tags, but the error surfaces:
AwsrefeksStack-ephemeral-tauta: creating CloudFormation changeset...
14:32:16 | UPDATE_FAILED | AWS::EKS::Nodegroup | awsrefeksClusterNo...pNodeGroup36246D59
Version and ReleaseVersion updates cannot be combined with other updates
Hope this can help. Thank you a lot!
nodeGroupUserData
doesn't exist:
userData: Fn.base64(this.nodeGroupUserData(props.ldapGroups)),
And what is image.getImage
?
imageId: image.getImage(this).imageId,
Did not include those as was thinking are not really relevant for the issue.
The nodeGroupUserData()
is the method for preparing the userdata of the workers while working with self managed EKS NodeGroups:
/**
* Puts together the EC2 user-data for bootstrapping the EKS Compute through NodeGroup nodes.
*
* @param ldapGroups a list of LDAP Groups that will have access to NodeGroup nodes
*/
private nodeGroupUserData(ldapGroups?: string[]) {
let loginAccessControlTable = '';
let sudoersCommand = '';
if (ldapGroups) {
let sudoersRaw = '';
loginAccessControlTable += '\\n#60 - EKS user defined access';
for (var group of ldapGroups) {
loginAccessControlTable += '\\n+ : (' + group + ') : ALL';
sudoersRaw += '%' + group + ' ALL=(ALL) ALL\\n';
}
sudoersCommand += `echo -e '${sudoersRaw}' > /etc/sudoers.d/75-owners`;
}
// ToDo: remove (CloudAutomation) once the CloudOps MR is merged and new AMIs are built
const userData = `MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash -e
# configure LDAP access in the NodeGroup workers
sed -i '/+ : qualys : ALL/ a ${loginAccessControlTable}' /etc/security/access.conf
${sudoersCommand}
# register workers with the EKS Cluster
set -ex
B64_CLUSTER_CA=${this.cluster.clusterCertificateAuthorityData}
API_SERVER_URL=${this.cluster.clusterEndpoint}
/etc/eks/bootstrap.sh ${this.clusterName} --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL
--==MYBOUNDARY==--`;
return userData;
}
and for the image.getImage()
you can safely replace that with the AMI ID of an EKS-optimized AMI image provided by AWS.
@otaviomacedo there are several issues related to this behaviour. I think this is a cloudformation limitation ...
EKS has no api call to update nodegroup version and nodgroup config in one. There are only:
UpdateNodegroupConfig UpdateNodegroupVersion
While one is running, you will get the error, that only 1 update can be performed at a time, if you try to start the other one. So for eks everything is fine.
And I think to prevent errors (which one should be done first if both api calls have to be performed? What should be done if something goes wrong?) I think this "error" was introduced by cloudformation. I for my self created a custom resource performing the nodegroup update after the version upgrade ... somehow decouple both api calls (but I'm only 10 minutes away from the custom resource timeout of 1 hour).
But in this example, the Tag change leads to a change of the launchtemplate tag specifications (that creates a new launch template version) and change of the nodegroup tags itself. So both api calls have to be performed and this is what cloudformation won't do.
It's allways the same problem that there are many scenarios where you have to change both, but cloudformation won't handle this. Also see #13602
It would be nice if this could be handled by cloudformation. Maybe there should be a property on the cloudformation nodegroup resource like updatestrategy: configFirst|versionFirst ...
Thanks, @markussiebert. I think you're spot on! I've raised the issue internally with the CloudFormation team.
@otaviomacedo any news from the cloudformation team?
same issue here
I have the same problem.
Just added cdk.Tags.of(this).add('foo', 'bar');
to my stack and got the Version and release version updates cannot be combined with other updates
error.
Would making the node-group
dependsOn the launch-template
have any effect of ordering?
No, it won't Help... I think I analyzed it enough and AWS could solve it If they want - I think AWS thinks that this behaviour is near perfect... Otherwise they would Change it
David Talbot @.***> schrieb am Mi., 22. Juni 2022, 01:48:
I have the same problem.
Just added cdk.Tags.of(this).add('foo', 'bar'); to my stack and got the Version and release version updates cannot be combined with other updates error.
Would making the node-group dependsOn the launch-template have any effect of ordering?
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cdk/issues/16644#issuecomment-1162469083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7LKUDFI3XRTNVVQ2M6TIDVQJIFDANCNFSM5EV5JPPA . You are receiving this because you were mentioned.Message ID: @.***>
@otaviomacedo Hi! Any news from the team(s)? Thanks
Hello @otaviomacedo , any signs of movement on this one? Cheers!
Has anyone at least found some workaround for this issue? Would it help to exclude the NodeGroup or the LaunchTemplate from the tagging? Is there a way to do this without having the Instances created by the ASG loose their tags?
https://github.com/aws/aws-cdk/issues/16742 @dbaumgarten this issue was also raised...nope we don't really have a solution/workaround yet
Is there at least a way I can (semi)manually change the tags of my ressources? In a way that works fine with future runs of cdk? I am currently in the situation were I would need to change some tags and can't do this without completely re-creating the whole stack...
Maybe something like:
I had created a custom resource lambda. Maybe I can publish it. It handles the nodegroup updates for us...
Am Mo., 23. Jan. 2023 um 15:59 Uhr schrieb dbaumgarten < @.***>:
Is there at least a way I can (semi)manually change the tags of my ressources? In a way that works fine with future runs of cdk? I am currently in the situation were I would need to change some tags and can't do this without completely re-creating the whole stack...
— Reply to this email directly, view it on GitHub https://github.com/aws/aws-cdk/issues/16644#issuecomment-1400491229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7LKUDG3GVNP7WN3B3DSD3WT2MFNANCNFSM5EV5JPPA . You are receiving this because you were mentioned.Message ID: @.***>
-- Freundliche Grüße
Markus Siebert
I had created a custom resource lambda. Maybe I can publish it. It handles the nodegroup updates for us...
@markussiebert did you end up publishing it? Would be a useful workaround for now...
AWS CDK seems to reveal so many flaws of CloudFormation. Since it could take years until AWS would fix all that bugs (or build a better base for turning the actual state to the target state), we removed as workaround tags from the NodeGroup (and also the problematic CfnUserGroup) with following aspect on our Stack construct:
@jsii.implements(aws_cdk.IAspect)
class TagRemoverAspect:
def visit(self, node):
# Per default we exclude following resource types from tagging:
# * Elasticache::UserGroup - has the error: "extraneous key [tags] is not
if isinstance(node, aws_cdk.aws_elasticache.CfnUserGroup):
node.add_deletion_override("Properties.Tags")
if isinstance(node, aws_cdk.aws_eks.Nodegroup):
node.node.default_child.add_deletion_override("Properties.Tags")
aws_cdk.Aspects.of(self).add(TagRemoverAspect())
I worked around this by enabling myself to pass in a static launch template version number to my EKS stack (set to the current version) and then remove it to update to the latest version.
2 CDK runs, but it worked and gives me an escape hatch in the future.
lastestVersionNumber or versionNumber causes trouble. use defaultVersionNumber and then manually change it on AWS console.
This issue has received a significant amount of attention so we are automatically upgrading its priority. A member of the community will see the re-prioritization and provide an update on the issue.
Hello guys,
We have a CDK Stack that, among others, it bundles together the creation of an EKS Cluster with an EKS Managed NodeGroup. For cost/analytic purposes we have some mandatory tags being applied to all the resources of a Stack. If a Tag is being added/changed/removed, the change is propagated in the CDK tree to all the resources as you can see below.
Basically the only change is one of the Tag's values which is applied to all the resources, but we are interested in particular about the
AWS::EKS::NodeGroup
andAWS::EC2::LaunchTemplate
as the Stack update fails with the following error:Did someone else face this issue before and might be able to advise? Thank you a lot!
What did you expect to happen?
Have the Tags updated without their change to cause the
NodeGroup
and theLaunchTemplate
update.Environment
This is :bug: Bug Report