explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.38k stars 751 forks source link

About RagChecker #1069

Open binghangli378 opened 4 months ago

binghangli378 commented 4 months ago

Recently, I have noticed something similar to ragas, which is the RagChecker. It provides a new perspective to evaluate RAG pipelines which separately focuses on:

This new perspective will provide a more detailed evaluation of the model’s performance, allowing for a deeper understanding of how different types of data chunks impact the evaluation process.

I am willing to design some new evaluation methods based on it. Please let me know if you are open to this idea, and I can provide further assistance or code examples.

jjmachan commented 3 months ago

@binghangli378 that is a very interesting Idea, reopening this to track more. Would you still like to help out on this?

@shahules786 something we can consider for #1010 ?

shahules786 commented 3 months ago

@binghangli378 Yes, this is very interesting. From what I observed they have a few more metrics that are not available in Ragas (note I just added #1174), I think the two metrics that would be beneficial are 1) self-knowledge: this would be something like a 1 - faithfulness score. Uses to measure how much of the generated response contains knowledge from LLM itself. 2) noise sensitivity: this is more interesting, I think what they are trying to achieve is

number of incorrect claims in the generated answer that came from irreverent chunks / total number of claims in the answer

This could be used to understand how bad noise in the context is affecting the quality of the generated answer. I also found this paper showing noise in retrieved-context effects answer quality.

tagging you guys in case if you're interested in contributing. I have added them to the metrics roadmap. @sky-2002 @vaishakhRaveendran

sky-2002 commented 3 months ago

I can take up noise-sensitivity In fact, we had discussed something similar what I was referring to as attributing each claim in answer to some context.

shahules786 commented 3 months ago

@sky-2002 Sure, Can you please comment in this issue so that I can assign it to you?