rag finds twice the same chunk (ex: from category_dodeeric & category_history_belgian_monarchy)

dodeeric / langchain-ai-assistant-with-hybrid-rag

This is a LLM chatbot coded with LangChain. The web interface is coded with Streamlit. It implements a hybrid RAG (keyword and semantic search) and chat memory.

https://bmae-ragai-webapp.azurewebsites.net

GNU General Public License v3.0

8 stars 1 forks source link

rag finds twice the same chunk (ex: from category_dodeeric & category_history_belgian_monarchy) #11

Closed dodeeric closed 1 month ago

dodeeric commented 1 month ago

ex.: incendie du château de Laeken

dodeeric commented 1 month ago

the ensemble retriever removes double results. if k=5 for bm25, and k=5 for vector db, then it can happen that the number of results for ensemble is <5.

bm25: result1 result1 result2 result2 result3

vector db: result1 result1 result2 result2 result4

ensemble: result1 result2 result3 result4

I saw such case in the langsmith logs.