aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.71k stars 3.93k forks source link

aws-eks: Configure logging for cluster resource provider #31745

Open patchwork01 opened 1 month ago

patchwork01 commented 1 month ago

Describe the feature

We'd like to be able to configure logging settings for the custom resource provider used to deploy EKS clusters. We'd like to be able to set the log groups for the lambdas and the provider. We need to be able to do this via JSii.

Use Case

We'd like to be able to deploy EKS clusters in a context where the log retention is fixed for the AWS account. In this context, any stack will fail to deploy if it attempts to create a CloudWatch log group without explicitly setting infinite log retention, so that the account configuration will take precedence.

We also have a requirement that we need to be able to remove the EKS cluster from an existing deployment, and then later add it back with the same configuration. We currently do this with an optional nested stack that is removed or added based on configuration. We'd like to be able to retain the log groups when we remove the optional stack, and then have it reattach to the same log groups when we add it back in. We're trying to achieve this by declaring the log groups in a separate stack. If we don't do that, when you remove the optional stack it keeps the log groups around, then when you add it back in it fails to deploy because the log groups already exist.

This includes any log groups associated with a resource provider, such as implicit log groups for a lambda or a waiter state machine for a custom resource.

Proposed Solution

It looks like the relevant resources are deployed in a nested stack ClusterResourceProvider (see cluster-resource-provider.ts). This is discovered by ClusterResource and FargateProfile, and if that nested stack already exists in the context, those resources will use the preconfigured instance. It's not currently possible to set log groups for that nested stack, or to create an instance of it via JSii.

Any way to configure log groups for that nested stack via JSii would solve the problem, either against the cluster or against a separate entity that just configures the nested stack.

We'll need to be able to set log groups for the two lambdas deployed under ClusterResourceProvider, the lambda deployed under the Provider there, and the waiter state machine that's also deployed under the Provider. ProviderProps has settings waiterStateMachineLogOptions, logRetention and logGroup. Those could be set from ClusterResourceProvider.

Other Information

This causes the following issues, which we will be unable to solve until this is resolved:

Acknowledgements

CDK version used

v2.162.1

Environment details (OS name and version, etc.)

Ubuntu 24.04

pahud commented 1 month ago

Not specifically to aws-eks but we've a similar issue at https://github.com/aws/aws-cdk/issues/30777

and CustomResourceConfig to set log retention lifetime here https://github.com/aws/aws-cdk/tree/main/packages/aws-cdk-lib/custom-resources#setting-log-retention-lifetime

Are you able to check if CustomResourceConfig works for you?

patchwork01 commented 1 month ago

I've just tried deploying with the log retention lifetime set to infinite with CustomResourceConfig.addLogRetentionLifetime, and it didn't apply to the EKS cluster custom resource provider. I used our account that disallows setting log group retention. It failed to deploy the cluster provider waiter state machine log group, because it was still trying to set the retention to 2 years.

I also don't think this would fully solve our use case if it did work. I'll update the issue description. We have the EKS cluster in an optional nested stack which is not deployed by default, and the user can add or remove it to/from a deployment by configuration. If there's a log group deployed in an optional stack, then if the stack gets removed the log group is retained, but then if the stack is added back in later, it will fail to deploy because the log group already exists.

I think the right way to do this is to declare the log groups in a separate stack and keep them managed by the CDK whether the optional stack is enabled or not? I'd be glad to hear if there's a better way to do it.