Build-RAGAI

Description

This project seeks to teach you how to build Python applications with generative AI functionality by using the LangChain and Transformers libraries.

While there is a section for OpenAI, most of the code that previously existed there has been repurposed and integrated with either the LangChain or Transformers libraries. This project includes code snippets, packages examples, and jupyter notebooks that you can augment, copy, or learn from respectively.

If you're new to building AI-powered applications, I suggest you start by playing with and executing the code in the LangChain notebooks. Seeing the code in action, editing it yourself, and creatively brainstorming new ideas is the best way to learn.

Below you'll find links to, and descriptions of, sections of this project for easy navigation.

This README:

Getting Started
Installation
License

LangChain:

Code Snippets: Here you'll find pluggable Python components.
- bufferwindow_memory.py: A simple memory component that can be used in a LangChain conversation.
- chatopenai.py: A simple LLM component that can be used to return chat messages.
- multi_queryvector_retrieval.py: An advanced retriever component that combines the power of multi-querying and multi-vector retrieval.
Notebooks: Here you'll find Jupyter notebooks that guide you through the use of many different LangChain classes.
- MergedDataLoader: Learn how to embed and query multiple data sources via MergedDataLoader. In this notebook, we learn how to clone GitHub repositories and scrape web documentation before embedding them into a vectorstore which we then use as a retriever. By the end of it, you should be comfortable using whatever sources as context in your own RAG projects.
- Custom Tools: Learn how to create and use custom tools in LangChain agents.
- Image Generation and Captioning + Video Generation: Learn to create an agent that chooses which generative tool to use based on your prompt. This example begins with the agent generating an image after refining the user's prompt.
- LangSmith Walkthrough: Learn how to use LangSmith tracing and pull prompts fromt he LangSmith Hub.
- Retrieval Augmented Generation: Get started with Retrieval Augmented Generation to enhance the performance of your LLM.
- MongoDB RAG: Perform similarity searching, metadata filtering, and question-answering with MongoDB.
- Pinecone and ChromaDB: A more basic but thorough walkthrough of performing retrieval augmented generation with two different vectorstores.
- FAISS and the HuggingFaceHub: Learn how to use FAISS indexes for similarity search with HuggingFaceHub embeddings. This example is a privacy friendly option, as everything runs locally. No GPU required!
- Runnables and Chains (LangChain Expression Language): Learn the difference of and how to use Runnables and Chains in LangChain. Here you'll dive deep into their specifics.
End to End Examples: Here you'll find scripts made to work out of the box.
- RAG with Agents: Learn to use Agents for RAG.
- Streamlit Chatbot: A simple Streamlit chatbot using OpenAI.
- Directory Loader: Use the DirectoryLoader class to load files for querying.
- PyPDF Directory Loader: Use the PypdfDirectoryLoader class to load files for querying.
- Facebook AI Similarity Search: Use the FacebookAISimilaritySearch class to load files for querying.
- Vectorstore RAG: Learn how to use vectorstores in LangChain.
- Pinecone: Use a Pinecone vector database "Index" as a retriever and chat with your documents.

OpenAI:

Code Snippets: Here you'll find code snippets using the OpenAI Python library.
- Text to Speech: Use the Whisper API to generate speech from text.
Notebooks: Here you'll find Jupyter notebooks that show you how to use the OpenAI Python library.
- Retrieval Augmented Generation: Get started with Retrieval Augmented Generation and Pinecone to enhance the performance of your LLM.

Transformers:

Code Snippets: Here you'll find code snippets using the Transformers Python library.
- Dolphin Mixtral: A simple function to generate text using pipeline.
Notebooks: Here you'll find Jupyter notebooks that show you how to use the Transformers Python library.
- Automatic Speech Recognition: Transcribe speech using Whisper-v3 in a Gradio demo.
Packages: Here you'll find CLI applications.
- Audio Transcription:
- MicTranscription: Transcribe audio using a microphone.
- Task Creation: Generates tasks based on transcribed audio.
- Train with Accelerate: Fine tune a sequence classification model using Accelerate to make things go extra fast.

Getting Started

Installation

Local Code Execution and Testing

This project is developed using PDM. You can install PDM using pip:

Start by navigating to the root directory of this project, then run:

pip install -U pdm

Then you'll need to install the dependencies using PDM:

pdm install

This command will create a virtual environment in .venv and install the dependencies in that environment. If you're on macOS or Linux, you can run source .venv/bin/activate to activate the environment. Otherwise, you can run the command .venv/Scripts/activate or .venv/Scripts/activate.ps1 to activate the environment.

By using a virtual environment we avoid cross contaminating our global Python environment.

Once our virtual environment is set up we need to select it as our kernel for the Jupyter Notebook. If you're in VSCode, you can do this at the top right of the notebook. If you're using a different IDE, you'll need to look for setup help online.

When selecting the kernel, ensure you choose the one that's located inside of the .venv directory, and not the global Python environment.

Test Your First Notebook

If you're totally new to building AI powered applications with access to external data, specifically retrieval augmented generation, check out the RAG Basics notebook. It's the most straightforward notebook, and its concepts are built upon in every other 'RAG' notebook.

Google Colab

Click the badge below to open the RAG Basics notebook in Colab.

Daethyra / Build-RAGAI

readme