Open ankitas3 opened 1 year ago
The sentence embedding cannot search for exact keywords since it searches in a vector space, not in the statistical properties of terms. If you want to use both semantic and keyword search, you can perform a hybrid search. However, if you are always searching for exact terms, it is better to use classical algorithms such as BM25.
Do we have any workaround for exact search to work?
I’m using one of the hugging face models: sentence-transformers/all-MiniLM-L6-v2 for semantic search. Currently I'm facing trouble while searching for exact keywords. This is basically required when searching for the following: a person’s name - John Davis a specific id/number - 2023 a keyword containing special characters - Legal-Compliance, Year’23, $200, Q&A.
I have data of large lengths(more than 500 words) and so for embedding creation, the data is stripped into an array of sentences of length 100 each and then encoded and averaged.
These embeddings are then stored and searched using OpenSearch which currently is returning irrelevant results/less relevant results on the top
Can someone help me with this. Is this the correct way to combine/average-out the embeddings? How do I search numbers and keywords with special characters here?