explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.72k stars 660 forks source link

Failed to parse output. Returning None. #1206

Open surengunturuamazon opened 1 month ago

surengunturuamazon commented 1 month ago

[ X] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug When calling evaluate on Ragas using AWS Bedrock, there are some examples that give the warning: "Failed to parse output. Returning None." and error: "root -> 1 -> reason field required (type=value_error.missing)" for a few examples.

Ragas version: 0.1.13 Python version: 3.10.14

Code to Reproduce from langchain_community.chat_models import BedrockChat from langchain_community.embeddings import BedrockEmbeddings from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

config = { "region_name": "us-east-1", # E.g. "us-east-1" "model_id": 'anthropic.claude-3-haiku-20240307-v1:0', # E.g "anthropic.claude-v2" "model_kwargs": {"temperature": 0.4}, }

bedrock_model = BedrockChat( region_name=config["region_name"], endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com", model_id=config["model_id"], model_kwargs=config["model_kwargs"], )

bedrock_embeddings = BedrockEmbeddings( region_name=config["region_name"], )

from ragas import evaluate import nest_asyncio # CHECK NOTES

nest_asyncio.apply()

result = evaluate( amnesty_qa["eval"], metrics=metrics, llm=bedrock_model, embeddings=bedrock_embeddings, )

result

Error trace Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[34]: ValidationError(2 validation errors for ContextPrecisionVerifications root -> 1 -> reason field required (type=value_error.missing) root -> 1 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None.

The above is for a subset of 20 examples that are evaluated.

Expected behavior None of the warnings above should come

Additional context This error/warning doesn't stop execution and a lot of examples do get evaluated smoothly. It is just that some examples can't be evaluated and results in a NaN value for the metric.

dosubot[bot] commented 1 month ago

Hello, @surengunturuamazon! I'm here to help you with any bugs, questions, or contributions.

To resolve the "Failed to parse output. Returning None." and "root -> 1 -> reason field required (type=value_error.missing)" errors when calling evaluate on Ragas using AWS Bedrock, you need to ensure that the JSON output conforms to the expected schema and includes all required fields. The error message indicates that the "reason" field is missing in the JSON output.

Here is a step-by-step approach to resolve the issue:

  1. Check the Schema: Ensure that the schema used in get_json_format_instructions includes the "reason" field as a required field.

  2. Modify the Schema: If the "reason" field is not included, add it to the schema.

  3. Update the Prompt: Ensure that the prompt used to generate the output includes instructions to include the "reason" field.

Here is an example of how you can modify the schema and prompt:

def get_json_format_instructions(pydantic_object: t.Type[TBaseModel]) -> str:
    # Copy schema to avoid altering original Pydantic schema.
    schema = {k: v for k, v in pydantic_object.schema().items()}

    # Remove extraneous fields.
    reduced_schema = schema
    if "title" in reduced_schema:
        del reduced_schema["title"]
    # Ensure json in context is well-formed with double quotes.
    schema_str = json.dumps(reduced_schema)

    # Add "reason" field to the schema if it's missing
    if "properties" in reduced_schema and "reason" not in reduced_schema["properties"]:
        reduced_schema["properties"]["reason"] = {"type": "string"}
        if "required" in reduced_schema:
            reduced_schema["required"].append("reason")
        else:
            reduced_schema["required"] = ["reason"]

    resp = JSON_FORMAT_INSTRUCTIONS.format(schema=schema_str)
    return resp

# Ensure the prompt includes instructions to include the "reason" field
FIX_OUTPUT_FORMAT = Prompt(
    name="",
    instruction="Below, the Completion did not satisfy the constraints given in the Prompt.",
    output_format_instruction="Ensure the output includes a 'reason' field.",
    input_keys=["prompt", "completion"],
    output_key="fixed_completion",
)

Additionally, when using AWS Bedrock with Ragas, ensure that you have correctly configured the BedrockChat and BedrockEmbeddings classes from Langchain, as shown in the following example:

from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings

config = {
    "region_name": "us-east-1",  # E.g. "us-east-1"
    "model_id": "anthropic.claude-3-haiku-20240307-v1:0",  # E.g "anthropic.claude-v2"
    "model_kwargs": {"temperature": 0.4},
}

bedrock_model = BedrockChat(
    region_name=config["region_name"],
    endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
    model_id=config["model_id"],
    model_kwargs=config["model_kwargs"],
)
# init the embeddings
bedrock_embeddings = BedrockEmbeddings(
    region_name=config["region_name"],
)

Furthermore, ensure that the EvaluatorChain class in Ragas is correctly validating the required fields. The _validate method in the EvaluatorChain class should check for the presence of all necessary fields, including "reason":

def _validate(
        self,
        input: dict[str, t.Any],
        question_key: str = "question",
        prediction_key: str = "answer",
        context_key: str = "contexts",
    ) -> None:
        # validate each example
        required_columns = EVALMODE_TO_COLUMNS[self.metric.evaluation_mode]
        if "question" in required_columns and question_key not in input:
            raise ValueError(
                f'"{question_key}" is required in each example'
                f"for the metric[{self.metric.name}] you have chosen."
            )
        if "answer" in required_columns and prediction_key not in input:
            raise ValueError(
                f'"{prediction_key}" is required in each prediction'
                f"for the metric[{self.metric.name}] you have chosen."
            )
        if "contexts" in required_columns and context_key not in input:
            raise ValueError(
                f'"{context_key}" is required in each prediction for the '
                f"metric[{self.metric.name}] you have chosen."
            )
        if "ground_truth" in required_columns and "ground_truth" not in input:
            raise ValueError(
                f'"ground_truth" is required in each prediction for the '
                f"metric[{self.metric.name}] you have chosen."
            )
        if "reason" in required_columns and "reason" not in input:
            raise ValueError(
                f'"reason" is required in each prediction for the '
                f"metric[{self.metric.name}] you have chosen."
            )

By ensuring that the schema includes the "reason" field, updating the prompt to instruct the model to include this field, and correctly configuring the Bedrock classes and validation methods, you should be able to resolve the errors [1][2][3].

To continue talking to Dosu, mention @dosu.