Raudaschl / rag-fusion

Other
803 stars 99 forks source link

Does multiple queries makes sense in a vector space? #7

Open DavidGOrtega opened 8 months ago

DavidGOrtega commented 8 months ago

👋

Interesting work! However I do not fully understand the need of generating multiple queries doing a vector search. Once the query is transformed into a vector the search is latent space centric so it will retrieve documents close to our vector. The rewritten queries will be semantically similar and I would not expect much benefit from this. The downside is the cost effectiveness having to multiply the number of queries and vector calculation for each query.

However, a completely different scenario would be generating the different queries in solely text based search DBs where having the same query reformulated with even different words that could retrieve semantically similar documents improving the results.

Have you made some comparison tests?

WDYT?

Raudaschl commented 8 months ago

Hi David,

Your question delves into the core of our methodological approach, and I appreciate the opportunity to clarify the rationale behind generating multiple queries in a vector search context :-)

The primary reason for employing multiple queries, despite the apparent semantic similarity post-vector transformation, is twofold.

First, it aims to capture the nuanced facets of complex queries that a single vector representation might not fully encapsulate. While it’s true that transforming a query into a vector places it in a latent space where documents of close proximity are retrieved, this process might not always grasp the multi-dimensional aspects of certain queries. By generating variations, we aim to explore these latent spaces more thoroughly, ensuring a broader yet still relevant coverage of potential results.

Second, and crucially, generating multiple similar queries and observing the consistency in results serves to build confidence in the robustness and reliability of the search outcomes. When similar but subtly different queries yield the same results, it underscores the search algorithm’s capability to successfully cover multiple perspectives and interpretations of a query, enhancing user trust in the system’s comprehensiveness and accuracy.

The concern regarding cost-effectiveness is certainly valid. However, our approach has been to optimise the vector calculation (small language model) and query generation processes (ChatGPT 3.5) to mitigate these concerns. Preliminary results have indicated that the slight increase in cost is offset by the significant improvement in the relevance and breadth of retrieved documents, especially in highly specialised or nuanced fields.

To your point about text-based search databases, you are correct in noting the distinct advantages of query variation in such contexts.

Regarding comparison tests, we are currently in the process of conducting comprehensive evaluations to quantify the benefits of our approach. These tests are designed to measure the improvement in relevance, coverage, user satisfaction, and the added value of consensus through redundancy against traditional single-query vector searches.

I anticipate publishing our findings soon and are optimistic about the potential implications for both academic research and practical applications.

Thanks again for your thoughtful query. It’s much appreciated and happy to answer any others questions.

Best, Adrian Raudaschl

On Thu, Mar 14, 2024 at 12:28 DavidGOrtega @.***> wrote:

👋

Interesting work! However I do not fully understand the need of generating multiple queries doing a vector search. Once the query is transformed into a vector the search is latent space centric so it will retrieve documents close to our vector. The rewritten queries will be semantically similar and I would not expect much benefit from this. The downside is the cost effectiveness having to multiply the number of queries and vector calculation for each query.

However, a completely different scenario would be generating the different queries in solely text based search DBs where having the same query reformulated with even different words that could retrieve semantically similar documents improving the results.

Have you made some comparison tests?

WDYT?

— Reply to this email directly, view it on GitHub https://github.com/Raudaschl/rag-fusion/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOSB5CWBFZCYAELMQZMIETYYGCV5AVCNFSM6AAAAABEV5EL7WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DMMJRGA4TCNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DavidGOrtega commented 8 months ago

Hi Adrian,

thanks for your extremely well detailed response. Continuing with the discussion and possible implications. Once the vectors are generated should not suffice with one query aggregating the vectors averaging them or even better with attention? I mean, instead of doing 5 queries to each source we do 1 query with that aggregated vector to each source.

Also, if we are willing to be close to user intent should not the original query score more than the alternative ones?

LucasLee-ff commented 2 months ago

I think that when using not only vector search but also Elasticsearch or other methods, generating multiple queries would surely be a good way to increase recall.

By the way, your idea sounds interesting. Have you conducted any experiments on this?

Raudaschl commented 2 months ago

Hi @LucasLee-ff ,

Thanks for your insightful comment! I completely agree that generating multiple queries can significantly enhance recall, especially in hybrid systems that leverage both vector search and traditional lexical methods or even graph retrieval. By broadening the search surface area, we can tap into both the semantic richness of vector search and the symbolic precision of other techniques to improve retrieval quality.

Regarding evaluations, we've actually conducted internal testing and found that the multi-query generation method has greatly enhanced the outputs of our product, Scopus AI. It has been particularly beneficial for increasing the relevance and breadth of retrieved documents in more complex, nuanced searches. However, we'd like to publish results showing improvements across multiple domains to validate the broader applicability of this approach.

There are already a few authors, like Zackary Rackauckas (referenced in this paper), who have done interesting evaluation testing in similar contexts using RAG Fusion We’re keen to build on this and show how multi-query generation, and techniques like Reciprocal Rank Fusion (RRF), can boost search quality across various use cases.

I’ll keep you updated as we gather more data and aim to publish something more comprehensive. In the meantime, I’d love to hear more about your thoughts on how this could be applied in other domains.

LucasLee-ff commented 2 months ago

@Raudaschl Thanks for your detailed explanation!

There’s something that has always been on my mind, but I’m not sure if you’ve noticed: when we use a hybrid retrieval system, the fusion strategy of the retrieval results can greatly affect the outcome. If we use different queries and different retrieval sources at the same time, there are two strategies when fusing the retrieval results:

  1. For each retrieval source, use RRF to fuse the results obtained from different queries. After that, we rank the results from different sources with other general methods (such as vector similarity or reranker models).
  2. For each query, use RRF to fuse the results from different retrieval sources, and then rank the results from different queries.

Have you conducted any experiments like this?

Raudaschl commented 1 week ago

Hey @LucasLee-ff ,

Great question! Thanks for these. There was some experimentation for this initially as we tried to find the optimal setup for RAG Fusion, and as we think about incorporating other retrieval techniques, this has certainly been something on my mind.

Both strategies we believe have their merits and challenges:

RRF per Retrieval Source, then Rank Across Sources:

RRF per Query Across Sources, then Rank Across Queries:

Right now, we try to minimise these challenges by ensuring that different queries are quite similar, aiming to build consensus with similar questions. If we were to include lexical or other retrieval results, we’d likely need a way to provide additional weighting for those if their presence in the top list was important. This might require a universal score reranker, like a cross-encoder, to harmonize and prioritize results across sources effectively.

Would love to hear how you’ve approached this balance in your work!