Closed saikrishna321 closed 3 weeks ago
To get a similarity metric by comparing the output from your application with the golden sample data, you can use the answer_correctness
metric from the ragas
library. Here is an example of how to do this:
from datasets import Dataset
from ragas.metrics import answer_correctness
from ragas import evaluate
data_samples = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
'ground_truth': ['The first superbowl was held on January 15, 1967', 'The New England Patriots have won the Super Bowl a record six times']
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset, metrics=[answer_correctness])
score.to_pandas()
This code will calculate the answer correctness score between the generated answers and the ground truth answers, providing a metric that ranges from 0 to 1, where a higher score indicates a better alignment.
Make sure you have your OpenAI API key set in your environment, as the answer_correctness
metric uses OpenAI's API to compute the score:
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
You can also explore other metrics provided by the ragas
library, such as faithfulness
, context_precision
, and answer_relevancy
, to get a comprehensive evaluation of your RAG pipeline.
To install the ragas
library, you can use pip:
pip install ragas
Alternatively, you can install it from the source:
pip install git+https://github.com/explodinggradients/ragas
For more detailed information and examples, refer to the Ragas documentation [1][2][3].
To continue talking to Dosu, mention @dosu.
[x] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question I already have golden data and I would like to get a similarity metric by comparing the output from the application with the golden sample data.