chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.25k stars 1.28k forks source link

Multi-process concurrent duckdb+parquet access to disk-persisted db #666

Closed ibeckermayer closed 1 year ago

ibeckermayer commented 1 year ago

I'm using chroma for a relatively straightforward project that initializes the chroma client in the "in-memory with saving/loading to disk" mode like

client = chromadb.Client(Settings(chroma_db_impl="duckdb+parquet", persist_directory=".chroma/my-db"))
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
                api_key=api_key,
                model_name="text-embedding-ada-002"
            )
collection = client.get_or_create_collection(name="my-db", embedding_function=openai_ef)

and then uses the client from a Flask api endpoint like (abbreviated)

app = Flask(__name__)

@app.route('/api/ask', methods=['POST'])
def ask():
    # omitted...
    query_result = collection.query(query_texts=[question], n_results=4)
    # continue...

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80, debug=LOG_LEVEL <= logging.DEBUG)

This app is deployed behind gunicorn with 7 worker processes, so effectively I'm creating the collection 7 times and the same "in-memory with saving/loading to disk" database can be queried concurrently by each of these worker processes.

I'm seeking guidance as to whether this architecture makes sense for a prototype application wherein we don't expect much concurrent use (7 workers is probably overkill, the main reason we need any concurrency is because there are long openai api calls involved and we want several people to be able to try out the prototype at once), and presuming the answer is yes, how far we might take this approach going forward.

I recognize that in the documentation you say

For production use cases, an in-memory database will not cut it. Run docker-compose up -d --build to run a production backend in Docker on your local computer. Simply update your API initialization and then use the API the same way as before.

but I'm nevertheless wondering:

  1. Presuming the database is read-only (e.g. I'm only calling collection.query), can I get away with this multiprocessing architecture using the duckdb+parquet implementation?
  2. What if writes were involved, would duckdb+parquet be fit for such concurrency?
  3. What changes when I run Chroma as a separate service, either in a Docker container on my own machine or on an AWS server as shown in your docs? What underlying implementation is Chroma using in that case? Is it significantly more memory efficient and/or fit for concurrency as compared to my current approach of 7 disk-persistence clients?

Ideally I'd like to keep my project as simple as possible and avoid adding a separate docker-mediated chroma server.

francisjervis commented 1 year ago

Thanks for posting this - working on Flask-Langchain and have been thinking about this issue, along with the related question of how to manage multiple Chroma objects within the same request (eg when an agent needs qa sources from both a global and user-specific collection). I suspect there are also Flask-specific points here, like managing the Chroma lifecycle in teardown handlers...but as far as I can see, there is no method to explicitly terminate a Chroma object.

ibeckermayer commented 1 year ago

@jeffchuber @atroyn can you provide any insight here?

atroyn commented 1 year ago

At the moment the worker processes would all see their own copy of the entire DB loaded into their memory. This works for read-only, but is inefficient since you have 7 copies of the DB in memory at once. For read/write, this would introduce a data race which you'd have to manage between each of the clients, because on persist they'd stomp one another into the same persist directory.

In the client/server architecture, there is a single server backend with an underlying doc store and index which can be accessed by multiple clients. You might be interested in the lightweight client which is easier to install, and can connect to a single serve backend https://docs.trychroma.com/usage-guide#using-the-python-http-only-client

This should be a lot more memory efficient, and handle concurrency better since each 'thin' client is talking to a backend server separately.

ibeckermayer commented 1 year ago

Makes perfect sense and thanks for pointing me to the lightweight client.