aws-samples / amazon-sagemaker-generativeai

Repository for training and deploying Generative AI models, including text-text, text-to-image generation and prompt engineering playground using SageMaker Studio.
MIT No Attribution
130 stars 88 forks source link

Code for Retrieval Augmented Generation (RAG) question answering with Llama 2, LangChain and Pinecone using SageMaker Studio Notebooks #31

Closed tzevelek closed 1 year ago

tzevelek commented 1 year ago

Description of changes: This notebook shows users how to use SageMaker Studio to implement RAG for fast experimentation and later deploy their models to SageMaker endpoints

Implement RAG in the notebook

  1. Load Llama-2 7B chat model from Hugging FAce and test question answering with LangChain
  2. Confirm that adding context leads to performance improvements
  3. Ingest external pdf files to Pinecone after converting them to embeddings with the bge-small model from Hugging Face
  4. Ask a question and augment the prompt by adding the most similar document extracts from Pinecone as context

From experimentation to large scale deployment Deploy your models to SageMaker endpoints

  1. Deploy llama-2 7b chat to a SageMaker Real-time endpoint
  2. Deploy the embeddings model to a SageMaker real time endpoint
  3. Ask the question and augment using LangChain again and augment the prompt. This time the request is to the SageMaker real time endpoint

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.