New serverless pattern - EventBridge-Bedrock-S3-AOSS

rajavaid77 commented 2 months ago

*Issue #, if available:2382

Description of changes: Added a serverless pattern to automate syncing an S3 data source sources to knowledge base for Amazon Bedrock using EventBridge scheduler. An EventBridge schedule is configured to run at a regular rate (default of 5mins). The EventBridge uses the universal target to trigger the StartIngestionJob API on the Amazon Bedrock Agent service. The StartIngestionJob operation in turn would sync the data source so that it is re-indexed to the knowledge base. Syncing is incremental, so Amazon Bedrock only processes added, modified, or deleted documents since the last sync.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

biswanathmukherjee commented 1 month ago

I am getting this error while deploying:

1:26:17 PM | CREATE_FAILED | AWS::Logs::Delivery | BedrockKBDelivery Resource handler returned message: "Supplied Policy document is breaching Cloudwatch Logs policy length limit. (Service: AWSLogs; Status Code: 400; Error Code: AccessDeniedException; Request ID: 5f1e6680-e223-4073-b440-033fadb60a0b; Proxy: null)" (RequestToken: e55167e6-3cf6-7017-b968-b5307db4d3e5, Handler ErrorCode: AccessDenied) 1:26:17 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | BedrockKBStack The following resource(s) failed to create: [BedrockKBDelivery]. Rollback requested by user. 1:26:17 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | BedrockKBStack The following resource(s) failed to create: [BedrockKBDelivery]. Rollback requested by user.

biswanathmukherjee commented 1 month ago

It seems that the pattern will periodically keep call Bedrock API to initiate KB sync even if there is no change in the source files? If so that will cause unnecessary API invocation costs. Instead of this, can we enhance this pattern to trigger the sync whenever there is a change in any source file and sync only that specific file from the bucket?

rajavaid77 commented 1 month ago

It seems that the pattern will periodically keep call Bedrock API to initiate KB sync even if there is no change in the source files? If so that will cause unnecessary API invocation costs. Instead of this, can we enhance this pattern to trigger the sync whenever there is a change in any source file and sync only that specific file from the bucket?

Amazon Bedrock only processes added, modified, or deleted documents since the last sync. So cost will be incurred only when there are changes to be applied. Also the pattern caters to a use case when customers want to have a scheduled sync so they can sync the KB on a periodic basis rather that for each update happening on the datasource, for example in a daily/weekly batch inference scenario. Customer can decide on the frequency of the schedule based on their requirement. But yes, there could be another pattern or approach to use the S3 events to trigger the StartIngestionJob. This would work for S3 but not for other datasources such as web crawler

rajavaid77 commented 1 month ago

It seems that the pattern will periodically keep call Bedrock API to initiate KB sync even if there is no change in the source files? If so that will cause unnecessary API invocation costs. Instead of this, can we enhance this pattern to trigger the sync whenever there is a change in any source file and sync only that specific file from the bucket?

I am getting this error while deploying:

1:26:17 PM | CREATE_FAILED | AWS::Logs::Delivery | BedrockKBDelivery Resource handler returned message: "Supplied Policy document is breaching Cloudwatch Logs policy length limit. (Service: AWSLogs; Status Code: 400; Error Code: AccessDeniedException; Request ID: 5f1e6680-e223-4073-b440-033fadb60a0b; Proxy: null)" (RequestToken: e55167e6-3cf6-7017-b968-b5307db4d3e5, Handler ErrorCode: AccessDenied) 1:26:17 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | BedrockKBStack The following resource(s) failed to create: [BedrockKBDelivery]. Rollback requested by user. 1:26:17 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack | BedrockKBStack The following resource(s) failed to create: [BedrockKBDelivery]. Rollback requested by user.

fixed by updating the resource policy sourceARn

biswanathmukherjee commented 1 month ago

Approved from my side. Thanks for your contribution!

aws-samples / serverless-patterns

New serverless pattern - EventBridge-Bedrock-S3-AOSS #2483