Closed ibeckermayer closed 1 year ago
Thanks for posting this - working on Flask-Langchain and have been thinking about this issue, along with the related question of how to manage multiple Chroma
objects within the same request (eg when an agent needs qa sources from both a global and user-specific collection). I suspect there are also Flask-specific points here, like managing the Chroma
lifecycle in teardown handlers...but as far as I can see, there is no method to explicitly terminate a Chroma
object.
@jeffchuber @atroyn can you provide any insight here?
At the moment the worker processes would all see their own copy of the entire DB loaded into their memory. This works for read-only, but is inefficient since you have 7 copies of the DB in memory at once. For read/write, this would introduce a data race which you'd have to manage between each of the clients, because on persist they'd stomp one another into the same persist directory.
In the client/server architecture, there is a single server backend with an underlying doc store and index which can be accessed by multiple clients. You might be interested in the lightweight client which is easier to install, and can connect to a single serve backend https://docs.trychroma.com/usage-guide#using-the-python-http-only-client
This should be a lot more memory efficient, and handle concurrency better since each 'thin' client is talking to a backend server separately.
Makes perfect sense and thanks for pointing me to the lightweight client.
I'm using chroma for a relatively straightforward project that initializes the chroma client in the "in-memory with saving/loading to disk" mode like
and then uses the client from a Flask api endpoint like (abbreviated)
This app is deployed behind gunicorn with 7 worker processes, so effectively I'm creating the
collection
7 times and the same "in-memory with saving/loading to disk" database can be queried concurrently by each of these worker processes.I'm seeking guidance as to whether this architecture makes sense for a prototype application wherein we don't expect much concurrent use (7 workers is probably overkill, the main reason we need any concurrency is because there are long openai api calls involved and we want several people to be able to try out the prototype at once), and presuming the answer is yes, how far we might take this approach going forward.
I recognize that in the documentation you say
but I'm nevertheless wondering:
collection.query
), can I get away with this multiprocessing architecture using the duckdb+parquet implementation?Ideally I'd like to keep my project as simple as possible and avoid adding a separate docker-mediated chroma server.