When CloudFormation attempts to delete a OpenSearch VectorIndex instance, they are encountering repeated CloudFormation errors (DELETE_FAILED) but eventually the CloudFormation deployment succeeds anyways.
The issue adds minutes of time to the CloudFormation stack deployment time because it attempts to repeatedly delete the OpenSearch VectorIndex, fail, then backoff.
Expected Behavior
Gracefully delete the OpenSearch index without any issues.
Current Behavior
Deletion throws errors
Reproduction Steps
None as of now, will need a code snippet to reproduce
Possible Solution
The CR Lambda actually uses this IAM policy: https://github.com/awslabs/generative-ai-cdk-constructs/blob/main/src/cdk-lib/opensearchserverless/vector-collection.ts#L138-L151
The problem with that IAM policy is that it only points to a single instance of an OpenSearch collection, but the Lambda is a static resource for the entire CloudFormation stack. So if you deploy multiple Knowledge Base + OpenSearch collection & indexes in the same CloudFormation stack this is problematic because it won’t have permissions to modify all of them.
They manually modified that IAM policy to allow access to all collections within the account and confirmed that it will gracefully delete the OpenSearch index without any issues.
We may not want to directly make this change in the CDK library because doing so introduces bug in grantDataAccess which grants access more broadly than it should.
Instead we may want to create a custom IAM policy just for the custom resource Lambda that has broad access to all collections within the account, rather than have it point to props.collection.aossPolicy as it does now.
Additional Information/Context
No response
CDK CLI Version
2.154.1
Framework Version
No response
Node.js Version
20
OS
MacOs
Language
Typescript, Python, .NET, Go
Language Version
No response
Region experiencing the issue
any
Code modification
no
Other information
No response
Service quota
[X] I have reviewed the service quotas for this construct
Describe the bug
Opening on behalf of an internal user
When CloudFormation attempts to delete a OpenSearch VectorIndex instance, they are encountering repeated CloudFormation errors (DELETE_FAILED) but eventually the CloudFormation deployment succeeds anyways.
The issue adds minutes of time to the CloudFormation stack deployment time because it attempts to repeatedly delete the OpenSearch VectorIndex, fail, then backoff.
Expected Behavior
Gracefully delete the OpenSearch index without any issues.
Current Behavior
Deletion throws errors
Reproduction Steps
None as of now, will need a code snippet to reproduce
Possible Solution
The CR Lambda actually uses this IAM policy: https://github.com/awslabs/generative-ai-cdk-constructs/blob/main/src/cdk-lib/opensearchserverless/vector-collection.ts#L138-L151 The problem with that IAM policy is that it only points to a single instance of an OpenSearch collection, but the Lambda is a static resource for the entire CloudFormation stack. So if you deploy multiple Knowledge Base + OpenSearch collection & indexes in the same CloudFormation stack this is problematic because it won’t have permissions to modify all of them. They manually modified that IAM policy to allow access to all collections within the account and confirmed that it will gracefully delete the OpenSearch index without any issues.
We may not want to directly make this change in the CDK library because doing so introduces bug in grantDataAccess which grants access more broadly than it should.
Instead we may want to create a custom IAM policy just for the custom resource Lambda that has broad access to all collections within the account, rather than have it point to props.collection.aossPolicy as it does now.
Additional Information/Context
No response
CDK CLI Version
2.154.1
Framework Version
No response
Node.js Version
20
OS
MacOs
Language
Typescript, Python, .NET, Go
Language Version
No response
Region experiencing the issue
any
Code modification
no
Other information
No response
Service quota