This repository contains code samples for a chatbot using Amazon Bedrock, LangChain & MongpDB Atlas Vector Search.
You can run this workshop using your own IDE (e.g. VSCode) or through Amazon SageMaker Studio.
Visit the Amazon Bedrock documentation for instructions on gaining model access. For Claude access, use the us-east-1
or us-west-2
region.
NOTE: This codebase uses the region
us-west-2
. Please update the region in the.env
file if using another region.
In this section, we will set up a MongoDB Atlas cluster and database cluster. We will launch a cluster using the Atlas console and set up a free-tier cluster. You can find more information in the documentation. However, you
1 - Create an account on MongoDB Atlas or login with an existing account
2 - Choose Build a Database to start the database configuration. Select the M0 option from the free tier and AWS as provider. Set the region to us-west-2
(or to the region where you run the workshop).
ℹ️ Note that it's ok to have the MongoDB cluster in a different region (us-east-1) while the codebase uses the region (
us-west-2
)
3 - You should be navigated to the Security Quickstart view. In the Security Quickstart view, create the database Username and Password. You will need them later to set up the connectivity to your cluster. Choose Create User once you defined the credentials.
When being asked about the network access for the connection, keep the selection to My Local Environment and let your local IP, which is added automatically, be appended to the IP Access List.
Choose Finish and Close to finalize the setup.
ℹ️ If you skip the User creation above, you can create the user using the "Database Access" option on the botton left of Atlas console.
4 - Now go to the "Network Access" option on the botton left, click on "+ Add IP Address", choose the option to "Allow Access From Anywhere", and set this entry to be deleted after the duration of the workshop.
⚠️ If you are using SageMaker studio to go along with this tutorial, this will ensure the MongoDB instance is accessible via internet.
5 - Now choose the "Database" option on the left panel, then click on "Browse Collections" of your newly created cluster. (It can take a few minutes for your cluster to become available)
6 - Click on "Add you Own Data" and create a database called langchain_db
with a collection called e_commerce
. Leave the additional preferences empty, and click "Create".
7 - Then, in the "Atlas Search" tab, create an index, select the option "Atlas Vector Search > JSON Editor", then click next:
8 - Select the collection e_commerce
you just created. Set the Index name to products-metadata
and add the following JSON into the JSON Editor:
{
"fields": [
{
"type": "vector",
"numDimensions": 1536,
"similarity": "cosine",
"path" :"embedding"
}
]
}
9 - Last, choose the "Database" option on the left panel, then click on "Connect" buttom of your cluster and follow the instructions Driver -> Python
to find the connection string.
Note down the connection string to use it later.
In this section, we will convert the text into a vector representation and store them into the MongoDB Atlas Vector index created in the previous section.
We start by using the correct credentials to access the database, unpacking the dataset and loading it to the vector database.
.env
file in the root directory and add the following environment variables:a. The region for Amazon Bedrock as REGION
b. The connection string to your MongoDB cluster as MDB_URI
. Write the actual username and password you from the previous step 3.
c. The database name as MDB_DATABASE
d. The collection name as MDB_COLLECTION
For example:
REGION=us-west-2
MDB_URI=mongodb+srv://{USERNAME}:{PASSWORD}@{HOSTNAME}.mongodb.net/?retryWrites=true&w=majority
MDB_DATABASE=langchain_db
MDB_COLLECTION=e_commerce
Note: If you are running the notebook in VSCode, also make sure you run
pip install ipykernel
Using Amazon SageMaker Studio, this workshop has been tested using the following image and instance type.
Please wait as SageMaker initialize the kernel.
After sucessfully loading the kernel, you should see the kernel details as indicated below.
streamlit run chatbot_rag.py
While chatting, check your terminal window to see how the chain is running.
NOTE: Set verbose=False for chain
ConversationalRetrievalChain
in the file langchain.py if you dont want to see the detailed output.