chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.51k stars 1.21k forks source link

[Bug]: query doesn't return the nearest embedding #2025

Open DonghaoZ opened 4 months ago

DonghaoZ commented 4 months ago

What happened?

Hi I noticed the query() doesn't always return the nearest embedding, even though I tried different HNSW params, I used the embedding in the database to do the query, but the original id was never returned. What should I do to make it better?

query_embedding=sr_collection.get(ids=['2'],include=['metadatas', 'embeddings'])['embeddings'][0] sr_collection.query(query_embeddings=query_embedding,n_results=10)

Versions

Chroma 0.4.24, Python 3.9.13, Windows

Relevant log output

No response

jeffchuber commented 4 months ago

@DonghaoZ this could be due to a number of things. how latency sensitive is your application?

jeugregg commented 3 months ago

I observe the same behavior. what do you mean by latency sensitive? The algo is not searching in all docs but just into a part of it all? depending on the n_results parameter?

DonghaoZ commented 3 months ago

@DonghaoZ this could be due to a number of things. how latency sensitive is your application?

Not very , if you mean how long could I wait until the function return the results.

jjerry-k commented 1 month ago

I have the same problem. In my case, different result was returns from 14.

# n_results = 13
# [['304505', '304476', '511567', '304494', '104391', '348358', '304454', '304462', '511542', '304451', '304474', '675322', '104424']]

# n_results = 14
# [['179843', '179655', '179699', '179723', '179751', '179911', '179710', '179888', '179825', '179739', '179885', '179779', '179724', '179840']]
jjerry-k commented 1 month ago

n_results is used as k in knn

https://github.com/chroma-core/chroma/blob/1770d857484774b3690efca804ebed90d3167f96/chromadb/segment/impl/vector/local_hnsw.py#L156C1-L157C75