Closed trks1970 closed 10 months ago
@trks1970 This makes sense to me. Before we dive deep into the fix, can you provide reproduction steps with CDK code that can help us quickly reproduce this issue from our end?
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.
Will take some time to provide an example.
Hi @pahud we are running into this too with some CDK-created CloudWatch log group policies:
Example 1, sending cloudtrail events to eventbridge and logging the events:
const loginEvent = aws_cloudtrail.Trail.onEvent(this, 'login-event', {
eventPattern: {
source: ['aws.signin'],
detailType: ['AWS Console Sign In via CloudTrail'],
},
});
loginEvent.addTarget(new targets.CloudWatchLogGroup(new logs.LogGroup(this, 'login-event-logs')));
Example 2, creating an OpenSearch domain:
new aws_opensearchservice.Domain(this, 'domain', {
// Specifically enabling logging here
logging: {
slowSearchLogEnabled: true,
appLogEnabled: true,
slowIndexLogEnabled: true,
},
});
@mvs5465
Are you having the Resource limit exceeded error just with this? Can you provide a full sample that reproduces this error?
const loginEvent = aws_cloudtrail.Trail.onEvent(this, 'login-event', {
eventPattern: {
source: ['aws.signin'],
detailType: ['AWS Console Sign In via CloudTrail'],
},
});
loginEvent.addTarget(new targets.CloudWatchLogGroup(new logs.LogGroup(this, 'login-event-logs')));
@pahud We already have 10 cloudwatch log group policies in our account. Running the code you mentioned will create one more, and thus the error.
OK I see.
Looking at this https://github.com/aws/aws-cdk/blob/cea1039e3664fdfa89c6f00cdaeb1a0185a12678/packages/%40aws-cdk/aws-ecs/lib/log-drivers/aws-log-driver.ts#L119
When we new AwsLogDriver() for a container definition, there will always be a grantWrite() execution, adding a new log group policy to this log group resource.
I don't have immediate workaround and I am making it p1 here for more visibility. Meanwhile I will try reproduce this in my account by creating 10 containers with the same log group.
@trks1970
I was trying to create 10 containers with 2 approaches below and I didn't hit the error you mentioned.
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
export class EcsTsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const logGroup = new cdk.aws_logs.LogGroup(this, 'LG')
const task = new cdk.aws_ecs.FargateTaskDefinition(this, 'Task', {
cpu: 512,
memoryLimitMiB: 4096,
});
const task2 = new cdk.aws_ecs.FargateTaskDefinition(this, 'Task2', {
cpu: 512,
memoryLimitMiB: 4096,
});
// this works
for (let x = 0; x < 10; x++) {
task.addContainer(`nginx${x}`, {
image: cdk.aws_ecs.ContainerImage.fromRegistry('nginx:latest'),
memoryLimitMiB: 256,
logging: new cdk.aws_ecs.AwsLogDriver({
logGroup,
streamPrefix: 'demo-',
})
});
};
// this works as well
for (let x = 0; x < 10; x++) {
task2.addContainer(`nginx${x}`, {
image: cdk.aws_ecs.ContainerImage.fromRegistry('nginx:latest'),
memoryLimitMiB: 256,
logging: new cdk.aws_ecs.AwsLogDriver({
logGroup: new cdk.aws_logs.LogGroup(this, `LG${x}`),
streamPrefix: 'demo-',
})
});
};
}
}
Can you provide me with the full cdk code that reproduces your error?
OK I can reproduce this as below:
// create 10 events each having a loggroup as the event target
for (let x = 0; x < 10; x++) {
const loginEvent = cdk.aws_cloudtrail.Trail.onEvent(this, `login-event${x}`, {
eventPattern: {
source: ['aws.signin'],
detailType: ['AWS Console Sign In via CloudTrail'],
},
});
loginEvent.addTarget(new cdk.aws_events_targets.CloudWatchLogGroup(
new cdk.aws_logs.LogGroup(this, `login-event-logs${x}`),
));
}
This will create 10 cloudwatch logs resource policies and according to the doc the hard limit is 10 thus the error.
The resources can be listed with aws logs describe-resource-policies
.
I think this is not a bug but a limitation of cloudwatch logs resources. From CDK's perspective we probably can have some workaround such as opt out the resource policy creation but I am not sure if this is a good idea.
I am making this as p1 feature request for more visibility from the team.
Similar Issue: https://github.com/aws/aws-cdk/issues/20313
I have come up with a possible workaround.
When I moved LogGroup creation to the separate stack, ResourcePolicy was not created.
In the TaskDefinition stack, I referenced created LogGroup by arn via LogGroup.from_log_group_arn()
.
Any Update on this?
@sakurai-ryo just seeing pull request on this. when do we expect this change to available? any local workaround we can have? Thanks
@XiaowenMaoA The CDK core team is currently on vacation, so the review will be done after the vacation. Please check the other issues for workarounds. https://github.com/aws/aws-cdk/issues/20313#issuecomment-1861126356
If this doesn't work, please send me the code and I may be able to suggest a workaround.
@sakurai-ryo we are not inside ECS task. create OS domain logic is like following. we have TenantNames over 10 and we create OpenSearch Domain for each TenantName.
import { Domain, EncryptionAtRestOptions, EngineVersion } from 'aws-cdk-lib/aws-opensearchservice';
export class OpenSearchClusterStack extends DeploymentStack {
public kmsKey: KmsKey;
public cloudwatchPolicy: PolicyStatement;
constructor(parent: App, name: string, props: OpenSearchClusterStackProps) {
super(parent, name, {
env: props.env,
softwareType: SoftwareType.LONG_RUNNING_SERVICE,
stackName: props.stackName,
tags: props.tags,
});
vpc = Utils.getVpc(this, this.region);
const securityGroup = this.createSecurityGroup(vpc, props);
this.kmsKey = this.createKmsKey(vpc, props);
this.cloudwatchPolicy = new PolicyStatement({
actions: ['logs:CreateLogStream', 'logs:PutLogEvents'],
resources: [
'*',
],
effect: Effect.ALLOW,
principals: [new ServicePrincipal('es.amazonaws.com')],
});
// create OS domain for each TenantName
Object.values(TenantNames).forEach((tenantNameStr) => {
const tenantName = tenantNameStr as TenantNames;
const clusterConfig = props.stage.clusterConfigByTenant[tenantName];
const globaltenantConfig = TENANT_CONFIG[tenantName];
const domainName = Utils.generateName(globaltenantConfig.clusterName, props.isOnebox)
.toLowerCase();
const domainAppLogGroupName = `${Utils.generateName(`${Constants.Resource.PREFIX}-OSDomain-AppLogGroup`, props.isOnebox)}-${domainName}`;
const domainAppLogGroup = new LogGroup(this, domainAppLogGroupName, {
logGroupName: domainAppLogGroupName,
retention: RetentionDays.TEN_YEARS,
removalPolicy: RemovalPolicy.RETAIN_ON_UPDATE_OR_DELETE,
});
const domain = this.createDomain(
domainName,
{ enabled: true, kmsKey: this.kmsKey.key },
domainAppLogGroup,
vpc,
securityGroup,
props,
clusterConfig,
);
domain.addAccessPolicies(this.cloudwatchPolicy);
});
}
private createDomain(
domainName: string,
encryptionProperties: EncryptionAtRestOptions,
appLogGroup: LogGroup,
vpc: IVpc,
securityGroup:SecurityGroup,
props: OpenSearchClusterStackProps,
clusterConfig: ClusterConfig,
) {
return new Domain(this, domainName, {
domainName,
version: EngineVersion.OPENSEARCH_1_1,
vpc,
zoneAwareness: {
enabled: true,
availabilityZoneCount: vpc.privateSubnets.length,
},
capacity: {
masterNodes: clusterConfig.masterNodeCount,
masterNodeInstanceType: clusterConfig.masterNodeType,
dataNodes: clusterConfig.instanceCount,
dataNodeInstanceType: clusterConfig.instanceType,
},
ebs: {
volumeSize: clusterConfig.instanceVolumeSize,
volumeType: clusterConfig.ebsVolumeType,
iops: 18000,
},
securityGroups: [securityGroup],
nodeToNodeEncryption: true,
encryptionAtRest: encryptionProperties,
enforceHttps: true,
logging: {
appLogEnabled: true,
appLogGroup,
},
removalPolicy: RemovalPolicy.RETAIN_ON_UPDATE_OR_DELETE,
});
}
Thanks @XiaowenMaoA
Looking at this part of the code, it appears that we are currently using CustomResource to create the ResourcePolicy for the logs. https://github.com/aws/aws-cdk/blob/9c0fecf9a5cc26416e814ae4754729b54c827b9d/packages/aws-cdk-lib/aws-opensearchservice/lib/domain.ts#L1717
Since CustomResource is executed by CloudFormation after cdk deploy
, it is difficult to remove the ResourcePolicy in the CDK code, for example by using the tryRemoveChild
method.
It may be possible to work around this by using a new custom resource and rewriting or deleting the ResourcePolicy, though.
The PR I created is an ECS fix, and even if it were merged, it would not solve the problem on the OpenSearch side. Would you be so kind as to create a new issue for us to discuss this matter further? In addition to the fact that ResourcePolicy can currently be created in CloudFormation without the custom resource, ResourcePolicy should be reused as described in this document. https://docs.aws.amazon.com/opensearch-service/latest/developerguide/createdomain-configure-slow-logs.html#:~:text=name%22%0A%20%20%20%20%7D%0A%7D-,Important,-CloudWatch%20Logs%20supports
@sakurai-ryo i found one related issue created before, https://github.com/aws/aws-cdk/issues/23637 Can we use this to track log-resource-policy in OpenSearch case? this is blocked our prod readiness coz we are scaling up to > 10 domain for sure.
i can not find a way to pass the shared ResourcePolicy and with logging enabled. Is there any possible workaround?
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
@XiaowenMaoA Yes, of course. As I mentioned in my previous comment, it may be difficult to avoid this problem. I would consider a PR to solve this problem.
Any update on this? we don't want to add policies on our roles, but we are forced to do it when defining the logs..
Hi @stoicad. Do you still have this problem when using aws-log-driver?
Hi @sakurai-ryo, yes we do. We are not allowed to touch the existing roles via CDK, but when defining this one:
public createContainer(taskDefinition: FargateTaskDefinition): void {
//TODO: fix logging on container
//Tried with Log Group ARN but still not working: "arn:aws:logs:us-east-1::log-group:/ecs/web-taskdefinition:*"
//logging: LogDrivers.awsLogs({ streamPrefix: `project-name-${this.ecsContextParams.environment}-web` })
var container = taskDefinition.addContainer(`project-name-${this.ecsContextParams.environment}-web-container`, {
image: ContainerImage.fromRegistry(this.envConfig.WEB_ECS_IMAGE),
//logging: LogDrivers.awsLogs({ streamPrefix: `project-name-${this.ecsContextParams.environment}-web` })
});
container.addPortMappings({ containerPort: 80 });
container.addPortMappings({ containerPort: 443 });
}
it tries to add permissions to the role attached to the ECS (as a separate policy), although the logs:* is already available on the role.
Thanks @stoicad. Does the existing roles refer to the ECS execution role? Could you share the entire code and errors to help us pinpoint the problem?
Describe the bug
We are running 12 Fargate containers, each of them logging to CloudWatch via LogDriver, creare with cdk.
However, the last 2 fails to deploy, due to
"Resource limit exceeded. (Service: CloudWatchLogs, Status Code: 400,
which I understand is fixed at 10 and that is it. No way to change.
However, it looks like log-resource-policy is created automaticall by LogDriver, which uses up the amount.
Expected Behavior
It is not possible a) not to create a policy or b) reuse a policy
Current Behavior
A log-resource-policy is created with each LogDriver.awsLogs call.
Reproduction Steps
Deploy 10 Containers with LogDriver
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.40.0 (build 56ba2ab)
Framework Version
aws-cdk-lib:2.27.0
Node.js Version
16.15.1
OS
Win10
Language
Java
Language Version
Kotlin
Other information
No response