chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.84k stars 1.24k forks source link

How to switch from local to hosted version #1177

Open jfaraklit opened 1 year ago

jfaraklit commented 1 year ago

What happened?

This is how I use chroma locally

from langchain.vectorstores import Chroma

def get_chroma():
    chroma = Chroma(
        collection_name='llm',
        embedding_function=embedding,
        persist_directory='./chroma.db'
    )
    return chroma

In my app I do self.db = get_chroma() and something like this to use it self.db.add_documents(....) etc

Now I have hosted a version in EC2. How to use the hosted version in my app instead of local.

Versions

latest

Relevant log output

No response

tazarov commented 1 year ago

@jfaraklit, you can dome something like this:

from langchain.vectorstores import Chroma
import chromadb
chroma_client = chromadb.HttpClient(host="http://my-chroma.ec2.aws.com:8000")
def get_chroma(client):
    chroma = Chroma(
        collection_name='llm',
        embedding_function=embedding,
        client=client
    )
    return chroma
self.db = get_chroma(client)
and something like this to use it
self.db.add_documents(....)
jfaraklit commented 1 year ago

Yeah, that works. Thanks a lot, Another strange issue I am getting. When I run my app and add documents to the db and when I query I get result. If I stop my app and start it again - now the same query returns [] empty array. This happens in both the local db and as server. Any thoughts?

tazarov commented 1 year ago

@jfaraklit, In local DB, you need to use PersistentClient(path="/path/to/data") whereas in the server, you need to make sure you have IS_PERSISTENT=1 env var.

How do you run the server, and can you share your local client code?

jfaraklit commented 1 year ago

@tazarov Actually I was running a delete command somehow that is why on the subsequent start the collection was deleted. so figured that one.

Gald to know that IS_PERSISTENT=1 is for the server. So basically it will create and keep the data on the Ec2 container? Also - how do you see/query data on the ec2 container? do I need to install chroma cli or anything on the container? if you can point me to some docs or show me one command to count the docs I will figure the rest.

tazarov commented 1 year ago

@jfaraklit, for AWS, have you tried - https://github.com/chroma-core/chroma/tree/main/examples/deployments/aws-terraform It will create an EC2 with an ESB volume mounted at /chroma-data where your chroma data will be stored.

Just FYI, the default chroma docker deployment keeps the data in the container, so it is not a good candidate if you can't afford to "lose" your data. Of course, you can add local mounts (which the above AWS deployment does)

jfaraklit commented 1 year ago

yeah, this is neat and I need ebs volume. I will move to this type of ec2 set up soon. at the meantime, when I added IS_PERSISTENT I got the below error.

def get_chroma(client):
    chroma = Chroma(
        collection_name='llm',
        embedding_function=embedding,
        #persist_directory='./chroma.db',
        IS_PERSISTENT=1,
        client=client
    )
    return chroma

File "/Users/jawed-mac/ELL/openAI/RandD/crafted-catalyst/realtime_ai_character/database/chroma.py", line 19, in get_chroma chroma = Chroma( TypeError: init() got an unexpected keyword argument 'IS_PERSISTENT'

tazarov commented 1 year ago

@jfaraklit IS_PERSISTENT is used for client/server deployment mode where it is passed as env var to the server. Maybe I am confusing you. If LC is what you want to use then keep using persistent_directory as this is the config value you need for local persistent client.

jfaraklit commented 1 year ago

Got it. thanks

abhishek351 commented 11 months ago

@jfaraklit, you can dome something like this:

from langchain.vectorstores import Chroma
import chromadb
chroma_client = chromadb.HttpClient(host="http://my-chroma.ec2.aws.com:8000")
def get_chroma(client):
    chroma = Chroma(
        collection_name='llm',
        embedding_function=embedding,
        client=client
    )
    return chroma
self.db = get_chroma(client)
and something like this to use it
self.db.add_documents(....)

Can you please explain what the term "client" refers to in this code?

jeffchuber commented 11 months ago

@abhishek351 client is the python class that creates a connection to the DB and ferries request to the DB. With langchain - it's helpful and more flexible to define the client outside langchain and pass it in