Open mihail911 opened 1 month ago
This is actually quite trivial via Langchain's EnsembleRetriever: https://python.langchain.com/docs/how_to/ensemble_retriever/
Also just wanted to confirm that the hybrid retriever we are using right now (PineconeHybridSearchRetriever) does indeed use a simplistic weighing according to Pinecone's documentation (I second-guessed myself because I had never read it explicitly, just assumed that was the case).
So the easiest way forward is to use Langchain's EnsembleRetriever if we want reciprocal rank fusion.
Hello @iuliaturc, I've seen what needs to be done and would like to work on this. Please assign this to me.
All yours @aarya-16 :)
Hello @iuliaturc I have made PR #87 for this issue. Let me know if it checks out and also if you want me to open a different Pull Request to add the unit tests for this file. (A different PR would be nice since it is Hacktoberfest 😄 )
Right now when we are combining the outputs of a bm25 encoder and a dense retriever we simply do a weighted average of their scores. It's more standard to use reciprocal rank fusion methods to combine multiple scores.
We should implement this alternative hybrid scoring method