chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.45k stars 1.3k forks source link

[Bug]: What's between DefaultEmbeddingFunction() and SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")? #2748

Closed h3clikejava closed 2 months ago

h3clikejava commented 2 months ago

What happened?

I save some embeddings by default like this: collection = client.get_or_create_collection(name=db_name)

Then, i can fetch data by DefaultEmbeddingFunction() like: emb_fn = embedding_functions.DefaultEmbeddingFunction() collection = client.get_or_create_collection(name=db_name, embedding_function=emb_fn) # It's work

But i can't fetch data by all-MiniLM-L6-v2 like: emb_fn = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2") collection = client.get_or_create_collection(name=db_name, embedding_function=emb_fn) # It's not work

What's the different between DefaultEmbeddingFunction and SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")?

Versions

ChromaDB V0.5.3, python v3.10.11, MacOS 15.0 Beta (24A5327a)

Relevant log output

No response

tazarov commented 2 months ago

@h3clikejava, thanks for raising this. Let's start with some background:

Under normal circumstances, you should not have trouble swapping between the two, as Chroma will accept queries using 384-dimensional embeddings even though there are slight differences in the output embeddings (in the order of 1-e4/1-e5 range).

That said, when you say it won't work, do you mean you get an error? Can you share the error?

jeffchuber commented 2 months ago

@h3clikejava happy to re-open if you can help out! closing for now as @tazarov did a good job addressing it