awslabs / generative-ai-cdk-constructs

AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns.
https://awslabs.github.io/generative-ai-cdk-constructs/
Apache License 2.0
364 stars 50 forks source link

(OpenSearch ): Repeated CloudFormation while deleting OpenSearch VectorIndex #667

Open krokoko opened 3 months ago

krokoko commented 3 months ago

Describe the bug

Opening on behalf of an internal user

When CloudFormation attempts to delete a OpenSearch VectorIndex instance, they are encountering repeated CloudFormation errors (DELETE_FAILED) but eventually the CloudFormation deployment succeeds anyways.

The issue adds minutes of time to the CloudFormation stack deployment time because it attempts to repeatedly delete the OpenSearch VectorIndex, fail, then backoff.

Expected Behavior

Gracefully delete the OpenSearch index without any issues.

Current Behavior

Deletion throws errors

Reproduction Steps

None as of now, will need a code snippet to reproduce

Possible Solution

The CR Lambda actually uses this IAM policy: https://github.com/awslabs/generative-ai-cdk-constructs/blob/main/src/cdk-lib/opensearchserverless/vector-collection.ts#L138-L151 The problem with that IAM policy is that it only points to a single instance of an OpenSearch collection, but the Lambda is a static resource for the entire CloudFormation stack. So if you deploy multiple Knowledge Base + OpenSearch collection & indexes in the same CloudFormation stack this is problematic because it won’t have permissions to modify all of them. They manually modified that IAM policy to allow access to all collections within the account and confirmed that it will gracefully delete the OpenSearch index without any issues.

We may not want to directly make this change in the CDK library because doing so introduces bug in grantDataAccess which grants access more broadly than it should.

Instead we may want to create a custom IAM policy just for the custom resource Lambda that has broad access to all collections within the account, rather than have it point to props.collection.aossPolicy as it does now.

Additional Information/Context

No response

CDK CLI Version

2.154.1

Framework Version

No response

Node.js Version

20

OS

MacOs

Language

Typescript, Python, .NET, Go

Language Version

No response

Region experiencing the issue

any

Code modification

no

Other information

No response

Service quota

scottschreckengaust commented 3 weeks ago

I was unable to replicate. Is there a particular example to follow up with to fix?