explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.6k stars 648 forks source link

New metric: to quantify the RAG Improvement #199

Closed YWen-AI closed 3 months ago

YWen-AI commented 11 months ago

It would be beneficial to have an evaluation metric that measures the improvement brought by the RAG. This metric should perform the following:

  1. Calculate the distance between the RAG-generated answers and the ground truths.
  2. Calculate the distance between the LLM stand-alone generated answers and the ground truths.
  3. Determine the improvement based on the two distances mentioned above. The resulting score could range from [-1, 1], since the improvement could potentially be negative.

Inputs required:

  1. Ground truths
  2. LLM stand-alone generated answers
  3. RAG generated answers
shahules786 commented 11 months ago

Hey @wywdiablo , this is interesting. We worked on a similar idea during the paper but couldn't quantify the difference effectively. How do you think this metric will help in development?

YWen-AI commented 11 months ago

@shahules786 I believe this metric will be important for certain use cases. For example, in one case we aim to enhance the answer by integrating the LLM's knowledge with a new external data source. A challenge arises when the RAG forms an answer based solely on the data we provided. If we offer a different prompt template in the RAG, such as, "You are a professor in the XXX domain," it seems only activate the LLM's inherent knowledge rather than drawing from the external data source. There should be a metric to inform the developer of this, allowing for better control.