New metric: to quantify the RAG Improvement

explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

https://docs.ragas.io

Apache License 2.0

6.6k stars 648 forks source link

New metric: to quantify the RAG Improvement #199

Closed YWen-AI closed 3 months ago

YWen-AI commented 11 months ago

It would be beneficial to have an evaluation metric that measures the improvement brought by the RAG. This metric should perform the following:

Calculate the distance between the RAG-generated answers and the ground truths.
Calculate the distance between the LLM stand-alone generated answers and the ground truths.
Determine the improvement based on the two distances mentioned above. The resulting score could range from [-1, 1], since the improvement could potentially be negative.

Inputs required:

Ground truths
LLM stand-alone generated answers
RAG generated answers

shahules786 commented 11 months ago

Hey @wywdiablo , this is interesting. We worked on a similar idea during the paper but couldn't quantify the difference effectively. How do you think this metric will help in development?

YWen-AI commented 11 months ago

@shahules786 I believe this metric will be important for certain use cases. For example, in one case we aim to enhance the answer by integrating the LLM's knowledge with a new external data source. A challenge arises when the RAG forms an answer based solely on the data we provided. If we offer a different prompt template in the RAG, such as, "You are a professor in the XXX domain," it seems only activate the LLM's inherent knowledge rather than drawing from the external data source. There should be a metric to inform the developer of this, allowing for better control.