awslabs / generative-ai-cdk-constructs

AWS Generative AI CDK Constructs are sample implementations of AWS CDK for common generative AI patterns.
https://awslabs.github.io/generative-ai-cdk-constructs/
Apache License 2.0
302 stars 40 forks source link

bedrock: initial sync for S3DataSource attached to KnowledgeBase #408

Open jlosito opened 2 months ago

jlosito commented 2 months ago

Describe the feature

I would like the ability to run an initial sync of a Data source that is associated with a Knowledge base through a property.

Use Case

I have a use case where I have an S3 bucket with objects already in it. I'm using this library to provision the Knowledge Base and Data source via the KnowledgeBase and S3DataSource. When the stack completes, I have to go an start an ingestion job through a script, but much rather do this via a property.

Other constructs have similar functionality, but not exactly. For instance, the Bucket construct has an autoDeleteObjects property. It does this via a custom resource. I think something similar could be achieved for the Data sources via a custom resource that simply calls StartIngestionJob.

Proposed Solution

new bedrock.S3DataSource(this, "KnowledgeBaseDataSource", {
    knowledgeBase: myKnowledgeBase,
    bucket: myBucket,
    dataSourceName: myDataSourceName,
    initialSync: true,
});

Other Information

No response

Acknowledgements

1vinodsingh1 commented 1 month ago

+1, Ingestion job is needed to sync with existing s3 contents.