Code for Retrieval Augmented Generation (RAG) question answering with Llama 2, LangChain and Pinecone using SageMaker Studio Notebooks

Description of changes: This notebook shows users how to use SageMaker Studio to implement RAG for fast experimentation and later deploy their models to SageMaker endpoints

Implement RAG in the notebook

Load Llama-2 7B chat model from Hugging FAce and test question answering with LangChain
Confirm that adding context leads to performance improvements
Ingest external pdf files to Pinecone after converting them to embeddings with the bge-small model from Hugging Face
Ask a question and augment the prompt by adding the most similar document extracts from Pinecone as context

From experimentation to large scale deployment Deploy your models to SageMaker endpoints

Deploy llama-2 7b chat to a SageMaker Real-time endpoint
Deploy the embeddings model to a SageMaker real time endpoint
Ask the question and augment using LangChain again and augment the prompt. This time the request is to the SageMaker real time endpoint

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws-samples / amazon-sagemaker-generativeai

Code for Retrieval Augmented Generation (RAG) question answering with Llama 2, LangChain and Pinecone using SageMaker Studio Notebooks #31