deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.66k stars 1.83k forks source link

Improve ranking with semantic document search #1686

Closed Omarnabk closed 2 years ago

Omarnabk commented 2 years ago

Question I'm using Haystack to search a massive website, including webpage, documents, social network pages related to that website.

The website has several topics, one of them is about IoT (internet of things). However, as IoT is the main topic, a lot of document contains that keyword (iot).

When applying semantic search and ranking, it returns reasonable results, but not the main page of iot on my website. If I compare to Google results, Google returns the main page of iot on my website but the semantic similarity returns documents that are closer to iot keyword.

I know that Google is much more complicated than semantic comparison, but I was wondering how could we improve the haystack search so it returns the main page first, like including ranking such as PageRank algorithm?

brandenchan commented 2 years ago

Hey @Omarnabk, I think that the best way is to prioritise titles over other text fields via the custom pipeline method that we discussed in #1669.

Just want to make sure that you are aware of the filtering function of the retriever too. If you know in advance which page you want to prioritise, you can filter just for that one (see Retriever Documentation for an example of a filter).

You could also implement your own reranking at the very end to prioritise documents from websites which have word overlap with the query terms. For this you could either define a new Node or otherwise consume the output of the pipeline and reorder the results.

ZanSara commented 2 years ago

Hello @Omarnabk, do you need more help on this topic? If so, please feel free to ask. Otherwise in a few days I'll close this issue :+1: