Stevenic / vectra

Vectra is a local vector database for Node.js with features similar to pinecone but built using local files.
MIT License
398 stars 32 forks source link

[Feature Request] Hybrid Search for Keywords #50

Open GregorBiswanger opened 6 months ago

GregorBiswanger commented 6 months ago

I have a feature request for the Vectra library. Currently, I am having difficulty obtaining accurate results from my data when searching for a single word occurrence. Other vector databases support a hybrid search that includes keyword searches to achieve more precise results.

Feature Request: I would like to see a hybrid search implemented that includes both traditional vector search and keyword search. This would significantly improve the accuracy of search results, especially for data containing frequently occurring words.

Benefit: Such a feature would enable more precise and relevant search results by combining the strengths of both search methods. This is particularly useful in cases where vector search alone is insufficient to find relevant results.

Examples:

Thank you for your great work on Vectra and for considering this feature request!

Cheers, Gregor

Stevenic commented 6 months ago

I don't disagree but this is a pretty big add... It would mean adding a keyword index which would mean loading two indexes into memory. I've thought a lot about this myself and I've just been reluctant to add the additional complexity (and memory hit.) I'm open to ideas for simple ways to implement hybrid search that doesn't involve a big memory hit or add a lot of complexity.

GaureeshAnvekar commented 1 week ago

@Stevenic, thanks for the clarification! There's a way to integrate "BM-25" keyword matching algorithm. But we'll need to store the extracted text from documents/urls as well. We can always keep the hybrid search optional, both during indexing and querying. Looking into it currently. Thanks again for this cool repo!