chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.09k stars 1.27k forks source link

[Bug]: collection.modify(metadata={"hnsw:space": "..."}) doesn't work #1168

Open bastiennes opened 1 year ago

bastiennes commented 1 year ago

What happened?

When I try to use modify method of collection class after creating and adding an item to it, the function doesn't work... Here's a little script to check the bug: you just play with the create_collection and collection.modify metadata and check the distance scores with the base function (l2) and the cosine function...


client = chromadb.EphemeralClient()
collection = client.create_collection(
            name="doc_analyzer", 
            metadata={"hnsw:space": "cosine"}
        )
sentences = ["This is a test sentence"]
ids = ["test_id"]
collection.add(
            documents=sentences,
            ids=ids,
        )

collection.modify(metadata={"hnsw:space": "l2"})

#insertion works fine
question = "This is a test"
response = collection.query(
            query_texts = [question],
            n_results=1,
)

for doc, sim in zip(response["documents"][0], response["distances"][0]):
            print(sim, doc)```

**score for l2 init only : 0.390127569437027**
**score for l2 init & cosine modify : 0.390127569437027**
**score for cosine init only : 0.1950635313987732**
**score for cosine init & l2 modify : 0.1950635313987732**

### Versions

ChromaDB 0.4.10, python 3.11.4, Ubuntu 22.04

### Relevant log output

_No response_
bastiennes commented 1 year ago

What's more, most of the embedding models used (notably the base model of the chromaDB implementation: all-MiniLM-L6-v2) are optimized for use with cosine similarity and not l2... It would be interesting to replace the l2 distance function by the cosine in the basic operation of chromaDB.

HammadB commented 1 year ago

You can't change the distance used after creating the collection. We can add better messaging around this.

reaganjlee commented 10 months ago

Relevant to #1052 and PR #1461