explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.11k stars 723 forks source link

How can I ignore specific symbol when evaluate answer_correctness #1282

Open landhu opened 1 month ago

landhu commented 1 month ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question My dataset here: {'question': 'what are you going', 'answer': "I'm going to have a meal", 'ground_truth': "I'm going to have a meal."}

Code Examples score = evaluate(dataset, metrics=[answer_correctness, answer_similarity]

Additional context but the result is: {'answer_correctness': 0.2490, 'answer_similarity': 0.9960}

when the dataset: {'question': 'what are you going', 'answer': "I'm going to have a meal.", 'ground_truth': "I'm going to have a meal."} the result {'answer_correctness': 1.0000, 'answer_similarity': 1.0000}

My llm: openai gtp4o. embedding: amazon.titan-embed-text-v2:0

### Question: Is there any way to score correctness higher. ignore specific symbol, like “---, ****” or any specific letter sometimes even the answer and truth are same, but the score is low. I not sure how to improve it. can truth support multiple cases? like: {'question': 'what are you going', 'answer': "I'm going to have a meal", 'ground_truth': ["I'm going to have a meal.", "xxxxxx", "xxxxxx"]}

dosubot[bot] commented 1 month ago

To ignore specific symbols when evaluating answer_correctness in the Ragas framework, you can customize the AnswerCorrectness class to include additional preprocessing steps. Here’s how you can extend the class to remove specific symbols before evaluation:

  1. Extend the AnswerCorrectness class: Add a preprocessing method to remove specific symbols and integrate it into the existing methods.
from dataclasses import dataclass
from ragas.metrics import AnswerCorrectness

@dataclass
class CustomAnswerCorrectness(AnswerCorrectness):
    def _create_statements_prompt(self, question: str, text: str) -> PromptValue:
        assert self.sentence_segmenter is not None, "sentence_segmenter is not set"

        # Additional preprocessing logic
        text = self._preprocess_text(text)

        sentences = self.sentence_segmenter.segment(text)
        sentences = [
            sentence for sentence in sentences if sentence.strip().endswith(".")
        ]
        sentences = "\n".join([f"{i}:{x}" for i, x in enumerate(sentences)])
        prompt_value = self.long_form_answer_prompt.format(
            question=question, answer=text, sentences=sentences
        )
        return prompt_value

    def _preprocess_text(self, text: str) -> str:
        # Example preprocessing: remove specific symbols
        symbols_to_ignore = ['---', '****']
        for symbol in symbols_to_ignore:
            text = text.replace(symbol, '')
        return text
  1. Use the custom class in your evaluation: Replace the default answer_correctness metric with your custom class.
# Import your custom class
from custom_metrics import CustomAnswerCorrectness

# Use the custom class in your evaluation
score = evaluate(dataset, metrics=[CustomAnswerCorrectness(), answer_similarity])

This customization will preprocess the text to remove the specified symbols before evaluating answer_correctness, potentially leading to higher correctness scores when those symbols are present in the answer or ground_truth [1].

To continue talking to Dosu, mention @dosu.

Tunnello commented 1 month ago

@landhu https://github.com/explodinggradients/ragas/blob/main/src/ragas/metrics/_answer_correctness.py#L210 It is because this code in function __create_statementsprompt at file __answercorrectness.py

sentences = [
            sentence for sentence in sentences if sentence.strip().endswith(".")
        ]