Closed vpopiolrccl closed 4 months ago
While doing some troubleshooting, I deleted the LogGroup that the message referred to and after executing a new cdk deploy, it completed successfully. This doesn't resolve the issue as we have multiples clusters running in different AWS accounts that would like to continue maintaining with the CDK blueprints. For those other accounts, access to make this type of changes (deleting a Log Group) is very restricted.
@vpopiolrccl we just released 1.15.1 as a patch release for some of the backwards compatibility issues. Please give it a try with a cluster that was produced with 1.14.1 and if the issue persists, i will need a blueprint example to reproduce.
@vpopiolrccl we just released 1.15.1 as a patch release for some of the backwards compatibility issues. Please give it a try with a cluster that was produced with 1.14.1 and if the issue persists, i will need a blueprint example to reproduce.
Thanks @shapirov103. I also tied 1.15.1 before opening the issue with the same results.
Looking further into the log, I see that it is most likely related to the cluster log implementation addressing this issue: https://github.com/aws-quickstart/cdk-eks-blueprints/issues/997
Let me take a look if we can introduce an option to reuse the existing log group for that.
But that Log Group seems to belong to the Step Function used by the Custom Resource
Yes, the native CDK implementation of the logging is using step functions to orchestrate log creation after the cluster. I am unclear about the name collision. Do I assume correctly that you have the control plane logging enabled with the blueprint?
Do I assume correctly that you have the control plane logging enabled with the blueprint?
Currently, not. But good point. Will most likely change this setting.
Just FYI, I ran provisioning with 1.14.1 for a cluster that resembles your setup (I could not directly reproduce as I don't have access to you env settings and ami version that you use).
const stackID = `${id}-blueprint`;
const clusterProps: blueprints.MngClusterProviderProps = {
version: KubernetesVersion.V1_29,
nodegroupName: 'my-ng',
instanceTypes: [InstanceType.of(InstanceClass.M5, InstanceSize.LARGE)],
minSize: 1,
maxSize: 3
}
console.log(`clusterProps: ${JSON.stringify(clusterProps)}`)
const clusterProvider = new blueprints.MngClusterProvider(clusterProps);
blueprints.EksBlueprint.builder()
.clusterProvider(clusterProvider)
.addOns(
new blueprints.AwsLoadBalancerControllerAddOn,
new blueprints.VpcCniAddOn(),
new blueprints.MetricsServerAddOn,
new blueprints.ClusterAutoScalerAddOn,
)
.teams()
.build(scope, stackID);
Provisioned cluster with 1.14.1, then upgraded the blueprints to 1.15.1 and reran deploy
. I got no errors, all addons were upgraded to the newer versions (e.g. loadbalancer, metrics server, etc.). That also confirms the experience from other customers who did not have issue with the log group when upgrading.
I will need a full blueprint example to reproduce.
Thanks so much @shapirov103. Looks like the problem was with 1.15.0 and not with 1.15.1. It now works for me.
I'm also seeing failures when going from 1.14.1 to 1.15.1, my stack does have control plane logging enabled.
Resource handler returned message: "Resource of type 'AWS::Logs::LogGroup' with identifier '{"/properties/LogGroupName":"/aws/vendedlogs/states/waiter-state-machine-STACKNAME-ProviderframeworkisCompl-S6XDAkzUUmoq-c8b1cfed19641073278d59059a5ed9e648e1781c7c"}' already exists." (RequestToken: 5fd41341-3e15-f2e5-826f-2f51001f349e, HandlerErrorCode: AlreadyExists)
@paulchambers These logs are not produced by the blueprints, they represent lambda logs for the custom resources in the CDK native implementation. I see somewhat related issue about it on the CDK repo here.
If you can drop the log groups similar to what vpopiolrccl described, that would resolve it. Please also consider running the latest cdk bootstrap
on the account/region.
If the problem persists, please share the blueprint to reproduce the issue.
@shapirov103 manually removing the loggroup does clear the error, but i'm seeing it on each cluster that I upgrade to 1.15.1
When going from 1.14.1 to 1.15.1 the first deploy fails with "No changes needed for the logging config provided" from the Custom::AWSCDK-EKS-Cluster resource
Second attempt fails with the loggroup error as above
Removing the loggroup then allows the deploy to succeed
@paulchambers as I mentioned in https://github.com/aws-quickstart/cdk-eks-blueprints/issues/1036#issuecomment-2204385772 , in my test I provisioned a cluster with 1.14.1, upgraded to 1.15.1 and was able to deploy successfully, all addons were updated to the latest version. It could be an issue specific to the CDK upgrade, as these log groups are created by the CDK impl.
If there is an example that I can use to reproduce the issue, I am happy to give it a shot, if needed I will create an issue against CDK.
About to close this, please let me know if anyone still has troubles with the step function log group.
Describe the bug
Trying to cdk deploy for a cluster previously created with the @aws-quickstart/eks-blueprints version 1.14.1 after upgrading to @aws-quickstart/eks-blueprints v1.15.0, I get errors in the Cloud Formation events
Expected Behavior
No changes should be made to the cluster as nothing changed in the stack
Current Behavior
The Cluster Provider nested tack produces this error when creating the Provider Waiter State Machine: Resource handler returned message: "Resource of type 'AWS::Logs::LogGroup' with identifier '{"/properties/LogGroupName":"/aws/vendedlogs/states/waiter-state-machine-rcg-ecom-cluster-sandbox--ProviderframeworkisCompl-q4ar3IV7b2Li-c823a05924272663236e0df94090e3304c5d23966c"}' already exists." (RequestToken: 48ba77a8-b8d7-7e17-71f3-1e29a5cfca0d, HandlerErrorCode: AlreadyExists)
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.147.1
EKS Blueprints Version
1.15.0
Node.js Version
21.6.1
Environment details (OS name and version, etc.)
Mac OS 14.5
Other information
No response