With this repo I want to showcase how to implement a streaming serverless Retrieval Augmented Generation (RAG) architecture.
Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory.
In this pattern, I run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it.
The responses are streamed using Lambda URL function streaming for a quicker time to first byte and a better user experience.
This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and any Amazon Bedrock chat model as prediction LLM.
I also provide a local pipeline to ingest your PDFs and upload them to S3.
There is also an example of how to consume streamed responses using a very basic React application.
Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and twitch live streams on twitch.tv/aws.
Description
With this repo I want to showcase how to implement a streaming serverless Retrieval Augmented Generation (RAG) architecture. Customers asked for a way to quickly test RAG capabilities on a small number of documents without managing infrastructure for contextual knowledge and non-parametric memory. In this pattern, I run a RAG workflow in a single Lambda function, so that customers only pay for the infrastructure they use, when they use it. The responses are streamed using Lambda URL function streaming for a quicker time to first byte and a better user experience. This pattern makes use of Bedrock to calculate embeddings with Amazon Titan Embedding and any Amazon Bedrock chat model as prediction LLM. I also provide a local pipeline to ingest your PDFs and upload them to S3. There is also an example of how to consume streamed responses using a very basic React application.
language
English
runtime
nodejs
Level
400
Type
Application
Use case
Interactive workload
Primary image
https://github.com/shafkevi/lambda-bedrock-s3-streaming-rag/raw/main/assets/StreamingServerlessRAG.png
IaC framework
AWS SAM
AWS Serverless services used
Description headline
One click deployment of a fully serverless streaming retrieval augmented generation application using Amazon Bedrock
Repo URL
https://github.com/shafkevi/lambda-bedrock-s3-streaming-rag
Additional resources
https://docs.aws.amazon.com/step-functions/latest/dg/connect-athena.html https://arxiv.org/abs/2005.11401 https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html https://aws.amazon.com/bedrock/titan/#Titan_Embeddings_.28generally_available.29
Author Name
Kevin Shaffer-Morrison
Author Image URL
https://kevin.shaffer-morrison.com/images/sideProfileHeadshot.jpg
Author Bio
Kevin Shaffer-Morrison is a Senior Solutions Architect at Amazon Web Services. He's helped hundreds of startups get off the ground quickly and up into the cloud. Kevin focuses on helping the earliest stage of founders with code samples and twitch live streams on twitch.tv/aws.
Author Twitter handle
No response
Author LinkedIn URL
https://www.linkedin.com/in/kshaffermorrison/
leave
No response