Failed to parse output.

explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

Apache License 2.0

6.57k stars 646 forks source link

from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3") import pandas as pd df = pd.read_csv("output.csv", sep=";") data_samples = { 'question': df['question'].tolist(), 'answer': df['answer'].tolist(), 'contexts': df['contexts'].apply(lambda x: [x] if isinstance(x, str) else x).tolist(), 'ground_truth': df['ground_truth'].tolist() } from datasets import Dataset dataset = Dataset.from_dict(data_samples) from ragas import evaluate from ragas.metrics import (faithfulness, answer_correctness, answer_relevancy, context_recall, context_precision) from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint end = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.3", max_new_tokens=512) huggingface_llm = ChatHuggingFace(llm=end, tokenizer=tokenizer) huggingface_embeddings = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1.5") metrics=[faithfulness, answer_correctness, answer_relevancy, context_recall, context_precision] score = evaluate(dataset=dataset, metrics=metrics, llm=huggingface_llm, embeddings=huggingface_embeddings, raise_exceptions=False )

Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[304]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Exception raised in Job[444]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Exception raised in Job[169]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) Failed to parse output. Returning None. Failed to parse output. Returning None.. Exception raised in Job[309]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Exception raised in Job[174]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[449]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Exception raised in Job[179]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Exception raised in Job[314]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[184]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[454]: ValidationError(2 validation errors for ContextPrecisionVerifications __root__ -> 0 -> reason field required (type=value_error.missing) __root__ -> 0 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Exception raised in Job[461]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3')) Exception raised in Job[196]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3')) Exception raised in Job[462]: ClientResponseError(429, message='Too Many Requests', url=URL('https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3'))

To resolve the "Failed to parse output" and "ValidationError" issues when using the evaluate function from the ragas library with the HuggingFaceEndpoint and Mistral-7B-Instruct-v0.3 model, you can follow these steps:

Ensure JSON Schema Compliance: Make sure that the output from the model strictly adheres to the JSON schema expected by the PydanticOutputParser. The get_json_format_instructions function provides a template for the expected JSON format. Ensure that the model's output matches this format.
Use the FIX_OUTPUT_FORMAT Prompt: The RagasoutputParser class has a mechanism to retry parsing the output if it fails initially. It uses the FIX_OUTPUT_FORMAT prompt to instruct the model to correct its output format. Ensure that this mechanism is properly utilized.
Increase max_retries: If the output often fails to parse correctly, consider increasing the max_retries parameter in the aparse method to give the model more chances to correct its output.

Here is an example of how you might adjust the RagasoutputParser class to handle these issues:

class RagasoutputParser(PydanticOutputParser):
    async def aparse(  # type: ignore
        self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 3  # Increase retries
    ):
        try:
            output = super().parse(result)
        except OutputParserException:
            if max_retries != 0:
                p_value = FIX_OUTPUT_FORMAT.format(
                    prompt=prompt.to_string(), completion=result
                )
                output = await llm.generate(p_value)
                result = output.generations[0][0].text
                return await self.aparse(result, prompt, llm, max_retries - 1)
            else:
                logger.warning("Failed to parse output. Returning None.")
                return None
        return output

By increasing the max_retries parameter, you give the model more opportunities to correct its output format, which can help mitigate the "Failed to parse output" and "ValidationError" issues ^[1].

Additionally, there are specific configurations or parameters for the evaluate function in the ragas library that could affect the output parsing or model response. These include:

metrics: List of metrics to use for evaluation.
llm: The language model to use for the metrics.
embeddings: The embeddings to use for the metrics.
callbacks: Lifecycle Langchain Callbacks to run during evaluation.
in_ci: Whether the evaluation is running in CI or not.
run_config: Configuration for runtime settings like timeout and retries.
token_usage_parser: Parser to get the token usage from the LLM result.
raise_exceptions: Whether to raise exceptions or not.
column_map: The column names of the dataset to use for evaluation.

These parameters can influence how the evaluation is conducted and how the results are parsed and returned ^[2]^[3].

For the ContextPrecisionVerifications class, the expected JSON schema is:

{
  "__root__": [
    {
      "reason": "string",
      "verdict": 0 or 1
    }
  ]
}

Each ContextPrecisionVerification object contains the fields reason (a string) and verdict (an integer, 0 or 1) ^[4].

Lastly, regarding the ClientResponseError with a 429 status code, it indicates that you are hitting the rate limits of the HuggingFace API. You might need to implement rate limiting and retry logic to handle these errors gracefully. Unfortunately, I couldn't find specific details on how the HuggingFaceEndpoint class handles these errors within the repository.

To continue talking to Dosu, mention @dosu.

explodinggradients / ragas

Failed to parse output. #1228