dodeeric / langchain-ai-assistant-with-hybrid-rag

This is a LLM chatbot coded with LangChain. The web interface is coded with Streamlit. It implements a hybrid RAG (keyword and semantic search) and chat memory.
https://bmae-ragai-webapp.azurewebsites.net
GNU General Public License v3.0
8 stars 1 forks source link

run chroma db as a server or use Chroma Cloud (now available serverlessly in the cloud) #43

Closed dodeeric closed 3 weeks ago

dodeeric commented 1 month ago

https://docs.trychroma.com/deployment/aws

https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/#basic-example-using-the-docker-container

a) run the server:

docker pull chromadb/chroma docker run -p 8000:8000 chromadb/chroma

Remark: By default, the Docker image will run with no authentication.

b) client:

pip install chromadb

from langchain_chroma import Chroma ==> from langchain import chromadb ==> from trychroma from chromadb.config import Settings

client = chromadb.HttpClient(settings=Settings(allow_reset=True))

tell LangChain to use our client and collection name;

db4 = Chroma( client=client, collection_name="my_collection", embedding_function=embedding_function, )


CHROMA_SERVER = True CHROMA_SERVER_HOST = "localhost" CHROMA_SERVER_PORT = "8080"

if CHROMA_SERVER:

with server:

chroma_client = chromadb.HttpClient(host=CHROMA_SERVER_HOST, port=CHROMA_SERVER_PORT) vector_db = Chroma( embedding_function=embedding_model, collection_name=COLLECTION_NAME, client=chroma_client )

else:

without server:

vector_db = Chroma( embedding_function=embedding_model, collection_name=COLLECTION_NAME, persist_directory="./chromadb" )

How to delete the DB?

dodeeric commented 1 month ago

Not docker: chroma run --path /db_path

https://docs.trychroma.com/guides

dodeeric commented 1 month ago

implemented, but not working or tested correctly.

dodeeric commented 3 weeks ago

CHROMA_SERVER = True CHROMA_SERVER_HOST = "localhost" CHROMA_SERVER_PORT = "8000"

$ chroma run --path ./chromadb

Check the Chroma vector DB: (OPTIONAL)

$ cd chromadb $ sqlite3 chroma.sqlite3 sqlite> .tables ===> List of the tables sqlite> select from collections; ===> Name of the collection (bmae) & size of the vectors (3072) sqlite> select count() from embeddings; ===> Number of records in the DB sqlite> select id, key, string_value from embedding_metadata LIMIT 10 OFFSET 0; ===> Display JSON items and PDF pages sqlite> PRAGMA table_info(embedding_metadata); ===> Structure of the table
sqlite> select from embedding_metadata where string_value like '%Delper%'; ===> Display matching records sqlite> select count() from embedding_metadata where string_value like '%Delper%'; ===> Display number of matching records

==> Embed is still done in the "no-server" chroma db!