This project seeks to teach you how to build Python applications with generative AI functionality by using the LangChain and Transformers libraries.
While there is a section for OpenAI, most of the code that previously existed there has been repurposed and integrated with either the LangChain or Transformers libraries. This project includes code snippets, packages examples, and jupyter notebooks that you can augment, copy, or learn from respectively.
If you're new to building AI-powered applications, I suggest you start by playing with and executing the code in the LangChain notebooks. Seeing the code in action, editing it yourself, and creatively brainstorming new ideas is the best way to learn.
Below you'll find links to, and descriptions of, sections of this project for easy navigation.
This README:
Code Snippets: Here you'll find pluggable Python components.
Notebooks: Here you'll find Jupyter notebooks that guide you through the use of many different LangChain classes.
MergedDataLoader
. In this notebook, we learn how to clone GitHub repositories and scrape web documentation before embedding them into a vectorstore which we then use as a retriever. By the end of it, you should be comfortable using whatever sources as context in your own RAG projects.End to End Examples: Here you'll find scripts made to work out of the box.
DirectoryLoader
class to load files for querying.PypdfDirectoryLoader
class to load files for querying.FacebookAISimilaritySearch
class to load files for querying.Pinecone
vector database "Index" as a retriever and chat with your documents. Code Snippets: Here you'll find code snippets using the OpenAI Python library.
Notebooks: Here you'll find Jupyter notebooks that show you how to use the OpenAI Python library.
Code Snippets: Here you'll find code snippets using the Transformers Python library.
pipeline
.Notebooks: Here you'll find Jupyter notebooks that show you how to use the Transformers Python library.
Packages: Here you'll find CLI applications.
This project is developed using PDM. You can install PDM using pip
:
Start by navigating to the root directory of this project, then run:
pip install -U pdm
Then you'll need to install the dependencies using PDM:
pdm install
This command will create a virtual environment in .venv
and install the dependencies in that environment. If you're on macOS or Linux, you can run source .venv/bin/activate
to activate the environment. Otherwise, you can run the command .venv/Scripts/activate
or .venv/Scripts/activate.ps1
to activate the environment.
By using a virtual environment we avoid cross contaminating our global Python environment.
Once our virtual environment is set up we need to select it as our kernel for the Jupyter Notebook. If you're in VSCode, you can do this at the top right of the notebook. If you're using a different IDE, you'll need to look for setup help online.
When selecting the kernel, ensure you choose the one that's located inside of the .venv
directory, and not the global Python environment.
If you're totally new to building AI powered applications with access to external data, specifically retrieval augmented generation, check out the RAG Basics notebook. It's the most straightforward notebook, and its concepts are built upon in every other 'RAG' notebook.
Click the badge below to open the RAG Basics notebook in Colab.