Open surengunturuamazon opened 1 month ago
Hello, @surengunturuamazon! I'm here to help you with any bugs, questions, or contributions.
To resolve the "Failed to parse output. Returning None." and "root -> 1 -> reason field required (type=value_error.missing)" errors when calling evaluate
on Ragas using AWS Bedrock, you need to ensure that the JSON output conforms to the expected schema and includes all required fields. The error message indicates that the "reason" field is missing in the JSON output.
Here is a step-by-step approach to resolve the issue:
Check the Schema: Ensure that the schema used in get_json_format_instructions
includes the "reason" field as a required field.
Modify the Schema: If the "reason" field is not included, add it to the schema.
Update the Prompt: Ensure that the prompt used to generate the output includes instructions to include the "reason" field.
Here is an example of how you can modify the schema and prompt:
def get_json_format_instructions(pydantic_object: t.Type[TBaseModel]) -> str:
# Copy schema to avoid altering original Pydantic schema.
schema = {k: v for k, v in pydantic_object.schema().items()}
# Remove extraneous fields.
reduced_schema = schema
if "title" in reduced_schema:
del reduced_schema["title"]
# Ensure json in context is well-formed with double quotes.
schema_str = json.dumps(reduced_schema)
# Add "reason" field to the schema if it's missing
if "properties" in reduced_schema and "reason" not in reduced_schema["properties"]:
reduced_schema["properties"]["reason"] = {"type": "string"}
if "required" in reduced_schema:
reduced_schema["required"].append("reason")
else:
reduced_schema["required"] = ["reason"]
resp = JSON_FORMAT_INSTRUCTIONS.format(schema=schema_str)
return resp
# Ensure the prompt includes instructions to include the "reason" field
FIX_OUTPUT_FORMAT = Prompt(
name="",
instruction="Below, the Completion did not satisfy the constraints given in the Prompt.",
output_format_instruction="Ensure the output includes a 'reason' field.",
input_keys=["prompt", "completion"],
output_key="fixed_completion",
)
Additionally, when using AWS Bedrock with Ragas, ensure that you have correctly configured the BedrockChat
and BedrockEmbeddings
classes from Langchain, as shown in the following example:
from langchain_community.chat_models import BedrockChat
from langchain_community.embeddings import BedrockEmbeddings
config = {
"region_name": "us-east-1", # E.g. "us-east-1"
"model_id": "anthropic.claude-3-haiku-20240307-v1:0", # E.g "anthropic.claude-v2"
"model_kwargs": {"temperature": 0.4},
}
bedrock_model = BedrockChat(
region_name=config["region_name"],
endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com",
model_id=config["model_id"],
model_kwargs=config["model_kwargs"],
)
# init the embeddings
bedrock_embeddings = BedrockEmbeddings(
region_name=config["region_name"],
)
Furthermore, ensure that the EvaluatorChain
class in Ragas is correctly validating the required fields. The _validate
method in the EvaluatorChain
class should check for the presence of all necessary fields, including "reason":
def _validate(
self,
input: dict[str, t.Any],
question_key: str = "question",
prediction_key: str = "answer",
context_key: str = "contexts",
) -> None:
# validate each example
required_columns = EVALMODE_TO_COLUMNS[self.metric.evaluation_mode]
if "question" in required_columns and question_key not in input:
raise ValueError(
f'"{question_key}" is required in each example'
f"for the metric[{self.metric.name}] you have chosen."
)
if "answer" in required_columns and prediction_key not in input:
raise ValueError(
f'"{prediction_key}" is required in each prediction'
f"for the metric[{self.metric.name}] you have chosen."
)
if "contexts" in required_columns and context_key not in input:
raise ValueError(
f'"{context_key}" is required in each prediction for the '
f"metric[{self.metric.name}] you have chosen."
)
if "ground_truth" in required_columns and "ground_truth" not in input:
raise ValueError(
f'"ground_truth" is required in each prediction for the '
f"metric[{self.metric.name}] you have chosen."
)
if "reason" in required_columns and "reason" not in input:
raise ValueError(
f'"reason" is required in each prediction for the '
f"metric[{self.metric.name}] you have chosen."
)
By ensuring that the schema includes the "reason" field, updating the prompt to instruct the model to include this field, and correctly configuring the Bedrock classes and validation methods, you should be able to resolve the errors [1][2][3].
To continue talking to Dosu, mention @dosu.
[ X] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug When calling evaluate on Ragas using AWS Bedrock, there are some examples that give the warning: "Failed to parse output. Returning None." and error: "root -> 1 -> reason field required (type=value_error.missing)" for a few examples.
Ragas version: 0.1.13 Python version: 3.10.14
Code to Reproduce from langchain_community.chat_models import BedrockChat from langchain_community.embeddings import BedrockEmbeddings from datasets import load_dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
config = { "region_name": "us-east-1", # E.g. "us-east-1" "model_id": 'anthropic.claude-3-haiku-20240307-v1:0', # E.g "anthropic.claude-v2" "model_kwargs": {"temperature": 0.4}, }
bedrock_model = BedrockChat( region_name=config["region_name"], endpoint_url=f"https://bedrock-runtime.{config['region_name']}.amazonaws.com", model_id=config["model_id"], model_kwargs=config["model_kwargs"], )
bedrock_embeddings = BedrockEmbeddings( region_name=config["region_name"], )
from ragas import evaluate import nest_asyncio # CHECK NOTES
nest_asyncio.apply()
result = evaluate( amnesty_qa["eval"], metrics=metrics, llm=bedrock_model, embeddings=bedrock_embeddings, )
result
Error trace Failed to parse output. Returning None. Failed to parse output. Returning None. Exception raised in Job[34]: ValidationError(2 validation errors for ContextPrecisionVerifications root -> 1 -> reason field required (type=value_error.missing) root -> 1 -> verdict field required (type=value_error.missing)) Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None. Failed to parse output. Returning None.
The above is for a subset of 20 examples that are evaluated.
Expected behavior None of the warnings above should come
Additional context This error/warning doesn't stop execution and a lot of examples do get evaluated smoothly. It is just that some examples can't be evaluated and results in a NaN value for the metric.