Custom RAG configuration not respected for retrievers

KruxAI / ragbuilder

A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data

https://docs.ragbuilder.io

Apache License 2.0

1.29k stars 108 forks source link

Custom RAG configuration not respected for retrievers #69

Open loiccordone opened 1 month ago

loiccordone commented 1 month ago

Hello! Thanks for RAGbuilder, it's a very nice project.

I am using the custom RAG configuration, but even if I explicitly mention that I only want to use "Vector DB - Similarity Search" as a retriever, in most of the runs it uses "Vector DB - MMR search" and/or "BM25 Search".

I don't have access to MMR search or BM25 with my vector DB (Azure CosmosDB for NoSQL), that's why I only want to use Similarity Search. I'm using Bayesian optimization, but it also happened with "Run all combinations". Am I doing something wrong or is there a bug? I pulled the latest 0.20.0 image and same issue.

Thank you!

aravind10x commented 1 month ago

Hi @loiccordone, thanks for reporting this - this is a bug. We will ship a patch today itself.

loiccordone commented 1 month ago

Hi @aravind10x, any news on this issue? Thank you!

aravind10x commented 1 month ago

Hi @loiccordone , yes, shipping the fix with this PR. Will publish as a new release shortly.

One thing to note: currently the BM25 retriever implementation is not vector database dependent. (See BM25 retriever in langchain). So perhaps, you don't need to unselect it even if your vector DB (Azure CosmosDB for NoSQL) doesn't support it.

Once again, thanks for flagging this issue! Truly appreciate it! Please reach out if you run into any other issues or challenges with your RAG optimization.

loiccordone commented 3 weeks ago

Hi @aravind10x, I am still encountering issues with 0.0.22. It now seems that it's not iterating through the selected retrievers? Even selecting the same retriever twice:

In this run, I've selected "Vector DB - Similarity Search", "Multi Query Retriever" and "Parent Doc Retriever - Full doc", but only "Vector DB - Similarity Search" (often doubled) appeared in the 50 runs (with Bayesian optimization)

Thank you

aravind10x commented 3 weeks ago

@loiccordone So sorry that this is still an issue - let me try to reproduce and fix this right away. Will confirm back once done.

aravind10x commented 2 weeks ago

@loiccordone - this is fixed now. Sorry about the sneaky bug! Updated the fix in repo. However the fix is not available yet in pypi or docker. Will let you know once we have it there

aravind10x commented 2 weeks ago

Hi @loiccordone - did you get a chance to test this? Please let us know if you run into any other issues.