explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
5.63k stars 523 forks source link

QuestionFilter filter logic #1029

Open Padarn opened 2 weeks ago

Padarn commented 2 weeks ago

[ x ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question I am trying to understand the RAGAS question filters, in particular QuestionFilter. My confusion stems from this part of the prompt (and corresponding examples)

1.Independence: Can the question be understood and answered without needing additional context or access to external references not provided within the question itself? Questions should be self-contained, meaning they do not rely on specific documents, tables, or prior knowledge not shared within the question.

Because context is not provided along with the question when assessed:

    async def filter(self, question: str) -> t.Tuple[bool, str]:
        prompt = self.filter_question_prompt.format(question=question)
        results = await self.llm.generate(prompt=prompt)
        results = results.generations[0][0].text.strip()
        results = await question_filter_parser.aparse(results, prompt, self.llm)
        results = results.dict() if results is not None else {}
        logger.debug("filtered question: %s", results)
        return results.get("verdict") == 1, results.get("feedback", "")

It is a bit unclear why we would want to filter these types of question.

Additional context Note really important to the question, but this came up when I was trying to generate examples from very simple documents to understand the output better:

documents = [
    Document("Padarn is ten years old. He likes to play football. One day XXXX = 1000000000000000000000")
]
dosubot[bot] commented 2 weeks ago

Hello, @Padarn! I'm here to help you with any bugs, questions, or contributions you may have. Let's see how I can assist you today!

Filtering questions that need additional context or external references not provided within the question itself is crucial for ensuring that the system can accurately and effectively answer the questions without requiring information that is not available. This helps maintain the quality and reliability of the responses generated by the system.

The QuestionFilter class in RAGAS is designed to determine if a question can be answered with the given context or if it requires additional information. Here's a breakdown of how it works:

  1. Prompt Formatting: The filter_question_prompt is formatted with the input question.
  2. LLM Generation: The formatted prompt is sent to the LLM to generate a response.
  3. Text Stripping: The generated text is stripped of any leading or trailing whitespace.
  4. Parsing: The stripped text is parsed using question_filter_parser.aparse, which processes the text and converts it into a structured format (likely a dictionary).
  5. Result Extraction: The parsed result is converted to a dictionary if it is not None.
  6. Logging: The parsed results are logged for debugging purposes.
  7. Return Values: The function returns a tuple containing a boolean verdict (True if the verdict is 1, otherwise False) and any feedback provided in the results.

This filtering process ensures that the system only attempts to answer questions that it has sufficient information for, thereby improving the accuracy and reliability of the responses [1].

The specific criteria used by the question_filter_parser to determine if a question is self-contained include evaluating the verdict field in the parsed results. If the verdict is 1, it indicates that the question is self-contained [1].

By filtering out questions that require additional context, the system avoids generating potentially inaccurate or incomplete answers, which is especially important in scenarios where the quality and reliability of the responses are critical.

To continue talking to Dosu, mention @dosu.

Padarn commented 2 weeks ago

Thanks Bot: but I'm still unclear on this, as the questions we are writing presumably are part of a RAG system that will try to add that context

dosubot[bot] commented 2 weeks ago

In a Retrieval Augmented Generation (RAG) system, the QuestionFilter is used to ensure that the questions being processed can be answered accurately with the available context. Even though the RAG system will try to add context, the initial filtering helps maintain the quality and reliability of the responses by ensuring that the questions are self-contained and do not rely on external references not provided within the question itself.

Here are the key reasons for this filtering:

  1. Ensuring Self-Containment: By filtering out questions that require additional context, the system ensures that the questions are self-contained. This means that the questions can be understood and answered without needing external references, which helps in generating more accurate and relevant responses.

  2. Improving Response Quality: Filtering out questions that need additional context helps in avoiding incomplete or inaccurate answers. This is crucial for maintaining the quality of the responses generated by the RAG system.

  3. Simplifying Processing: Self-contained questions are easier to process and answer, as they do not require the system to fetch and integrate additional information. This simplifies the overall processing and improves the efficiency of the system.

  4. Consistency: Ensuring that all questions are self-contained helps in maintaining consistency in the responses. This is important for providing a reliable user experience.

The QuestionFilter class in RAGAS uses a prompt to determine if a question can be answered with the available information. If a question requires additional context or external references, it may not be possible to generate a reliable answer, leading to the need for filtering such questions [1].

By implementing this filter, the RAG system can focus on questions that it can answer accurately with the given context, thereby improving the overall performance and reliability of the system.

To continue talking to Dosu, mention @dosu.