Closed jeffchuber closed 2 years ago
@levand @atroyn Opened this issue to discuss. I realized we should talk through this while building https://github.com/chroma-core/chroma/pull/13
Making a list (will be updated inline) of projects that we can perhaps find some inspiration from...
Logging / APM
Product Analytics
Orchestration
ML monitoring / experiment management
to be continued........
This is a great question.
We're talking about an initial MVP right? My question is... is a "Wolf A" model actually satisfy the "V" in "MVP"... is it viable? It could make a good demo and help generate sales leads, for sure. but as I understand the product, we're almost certainly going to need Wolf B for any kind of production use. For example, as soon as we start persisting data, we're going to need to do it to somewhere other than a developer's laptop or CI instance.
So I'm going to go out on a limb and say that I don't think a completely in-process model makes sense even for an MVP. So, in this hypothesis, we're going to have a frontend, and a backend (chroma-client
and chroma-server
, as hypothesized)
But there's still a decision point to be made here:
I do tend to disregard #3, just because it could get a lot more complicated for probably not a ton of benefit.
When trying to compare between type 1 and type 2, we need to consider:
Satisfying the computational requirements is probably going to be easier in a 1 model, since it's easier to establish a requirement that a Chroma Server has $x amount of RAM & CPU/GPU power than it is that every possible client will.
The network transport constraints are another question. Ultimately, it's a wash, because the same data has to be brought together at some point to perform the operations we want. If we're considering batch-mode operations, it genuinely doesn't matter because ultimately the same amount of data has to traverse some wire, somewhere, to make it happen. For high-frequency non-batched operations, you have to add 1-3 milliseconds of latency for each request and in that case it could make sense to have the computation local to the request, if it's a particularly performance-intensive scenario.
Here is another slightly different perspective - I like how MLFlow handles tracking - https://mlflow.org/docs/latest/tracking.html#how-runs-and-artifacts-are-recorded. (ignore the artifact
part of the charts since we don't have heavy files to move around like mlflow does) In this paradigm, the lightest weight place things are serialized is a .chroma
folder. (MLFlow is 100% python a well and Apache 2.0). I think how they accomplish this is that all code is packaged up in the pip project. That means there is not a separate client or server... it just about which code you are using in various scenarios. I guess the downside of that is (1) size of the project in megabytes, and (2) versioning frequency and version management across the front and backend without explicit version pinning.
For (1)/(2)/(3) "where to put the business logic". I 100% agree that (3) is bad. My general bias is towards having a thick backend and thin client. Especially since most operations - the operations will need context from the db in order to complete (will need to query the NN index for example).
The discussion is still very open! :) @atroyn join the mix as well!
one additional note... mlflow is purely a store - it does no processing on it. that is different from us where our processing is computationally expensive.
@atroyn should we try to make the in-memory thing work at all? i'm starting to tend towards "not worth it"
One other distinction to make: "in memory" doesn't necessarily mean "all in client with no backend."
It could also mean a full backend/frontend split, but with the backend implemented in-memory + very simple persistence (as opposed to a more complex vector database) for the MVP.
In the past when I've advocated to start "in memory", that's what I was referring to... a full backend but with a trivial in-memory implementation. Not trying to cram all the computation into the frontend.
MLFlow is an interesting model, but note that they are highly modular and support many different topologies. We could do that too, but my guess is we want to streamline and present "one way" as the default. We can enable other modalities as options if that's where the market pushes us.
I am inclined to thin client Wolf B for a few reasons:
There's risks around things like breaking the client/server versioning in the future, and there is added complexity, but this gut checks as the right move to me.
Anthropic has something called Garçon, similar-ish for probing remote running models with the aim that their scientists can easily examine something running somewhere else, which uses a client-server setup
I also read luke's 'in memory' as referring to where the processing was done; flat storage with all computation done in memory rather than in a vector DB, I favor this as well for development speed and deployment ease into the user's machine.
Ok I agree with all of this. I think it was good to talk through, thanks for the thoughts! Keeping things simpler and opinionated is the right way to go (assuming we have the right opinions of course).
So I believe we all agree we will move forward with:
chroma-client
- a thin python client that writes to the backendchroma-server
- a fat python backendThat means then that if a user is using a notebook - they will need to do docker-compose up
(or whatever our backend init script is) in the notebook. Docker does work on Google colab! I am ok with this. Just confirming we are all on the same page here.
There is the additional question of how thin the client is... and specifically whether the backend has the idea of log
or whether the client
simply knows to call the things that log does (eg store the data here, trigger this reprocessing
). The current open discussion is here https://github.com/chroma-core/chroma/pull/13#discussion_r1002612123
Closing this issue as we have agreed on a direction
chroma-client
is mainly is responsible for the public API and data ferrying to the backend.chroma-server
is mainly is responsible for storing data and running computations.Inside chroma there are 2 wolves:
Wolf B is fairly obvious... there is
chroma-client
andchroma-server
and then work together.Wolf A however... does
chroma-client
have extra functionality to handle the in-memory use case? or is there seperate code that is shared between chroma-server and chroma-client?I think some code has to be shared... the code for "doing the maths" - and possibly more code, things like data formats, etc. Possibly more. How should this be structured?