amikos-tech / chromadb-java-client

A thin client for Chroma Vector DB implemented in Java
MIT License
46 stars 7 forks source link

Default ChromaDB embeddings ` all-MiniLM-L6-v2` #27

Closed namedgraph closed 1 month ago

namedgraph commented 5 months ago

My understanding was that ChromaDB's default embeddings are running locally and do not require an API key. However I cannot find an example like this in the README, all examples require an API key. Am I missing something?

namedgraph commented 5 months ago

I got myself a HuggingFace API key and tried using HuggingFaceEmbeddingFunction.

The documents get created in ChromaDB with correct documents and metadata, but the embeddings field is null. Is that expected?

tazarov commented 4 months ago

@namedgraph, Chroma would not accept a request with null as embeddings. Let me have a look. As far as the default embedding, you are right about it running locally however that is for the Python client.

I'll investigate whether on runtime (it is written in C/C++, so it might not be very platform-independent like Java) can be executed within Java.

tazarov commented 4 months ago

@namedgraph, you are in luck MS have added support - https://github.com/microsoft/onnxruntime/blob/main/java/README.md.

I'll implement it shortly.

haqian555 commented 1 month ago

@tazarov Hello,How to run the ChromaDB's default embeddings in local

tazarov commented 1 month ago

@namedgraph and @haqian555, I spent some time to day and I'm happy to say that I've managed to get a Default embedding function with mini-lm model running and generating results inline with what the original Chroma EF is doing. The good news is that it will also work for better models that have been converted to ort.

I'll run some tests that prove this works not only on my machine :) I'll add this functionality over the next couple of days. Thanks for your patience :).

haqian555 commented 1 month ago

That's great,I really appreciate your work

tazarov commented 1 month ago

@haqian555 and @namedgraph the default EF functionality is now merged. Sorry it took a little longer, but had to make sure it was identical to Chroma's default EF and SentenceTransformers equivalents in Python.