chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.48k stars 1.21k forks source link

[Feature Request]: Text Search (BM25) and Embedding Search in a single collection.query #2633

Open srinivas-sateesh opened 1 month ago

srinivas-sateesh commented 1 month ago

Describe the problem

When I try to use collection.query with both query_text and query_embeddings I get the following error ValueError: You must provide one of query_embeddings, query_texts, query_images, or query_uris.

I know you have where_documents. But that's not it. Need a hybrid search basically.

Describe the proposed solution

Achieve high quality retrieval with hybrid vector and full text search with rich metadata filters and custom reranking.

Alternatives considered

No response

Importance

i cannot use Chroma without it

Additional Information

No response

jeffchuber commented 1 month ago

@srinivas-sateesh query_texts handles the embedding for you - same as providing query_embedding yourself. Chroma doesn't have BM25 (yet), but as you noted where_document does use full-text-search.

can you tell me more about your specific desire for bm25? it is something we are thinking about adding

Eknathabhiram commented 1 month ago

@srinivas-sateesh @jeffchuber Even I have been looking for the same thing today. It would be lot easier to implement hybrid search if chroma inbuilt has BM25(basically lexical based search).

srinivas-sateesh commented 1 month ago

User queries have a significant percentage of domain specific terms and jargon. BM25 will add value.

On Tue, 6 Aug, 2024, 12:41 pm Jeff Huber, @.***> wrote:

@srinivas-sateesh https://github.com/srinivas-sateesh query_texts handles the embedding for you - same as providing query_embedding yourself. Chroma doesn't have BM25 (yet), but as you noted where_document does use full-text-search.

can you tell me more about your specific desire for bm25? it is something we are thinking about adding

— Reply to this email directly, view it on GitHub https://github.com/chroma-core/chroma/issues/2633#issuecomment-2270546461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR5RZGJMPHN4BBEWNUEKJLZQBZLDAVCNFSM6AAAAABL6XPE5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZQGU2DMNBWGE . You are receiving this because you were mentioned.Message ID: @.***>

bash99 commented 1 month ago

hybrid search

for hybrid search I think it would be really nice to have sparse vector index in chroma.