Open Simeon-Krah opened 1 month ago
Hey @Simeon-Krah! I'm here to help you with bugs, questions, and more. Let's tackle this issue together!
I found a similar issue that might help you resolve the ValidationError: "value is not a valid list"
for SingleTurnSample
when running the evaluate
method. The SingleTurnSample
expects user_input
to be a string, not a list. Ensure that the retrieved_contexts
field is a list of strings.
Here is an example of how to correctly create a SingleTurnSample
:
from ragas.dataset_schema import SingleTurnSample
# Correct usage for SingleTurnSample
single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts=["context1", "context2"])
Ensure that when you create instances of SingleTurnSample
, the retrieved_contexts
is always a list of strings. If you mistakenly provide a different type, it will raise the ValidationError
.
For example, this will raise an error:
# Incorrect usage for SingleTurnSample
single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts="context1")
Make sure your code aligns with the correct usage as shown in the tests:
def test_evaluation_dataset():
single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts=["context1", "context2"])
dataset = EvaluationDataset(samples=[single_turn_sample, single_turn_sample])
hf_dataset = dataset.to_hf_dataset()
assert dataset.get_sample_type() == SingleTurnSample
assert len(hf_dataset) == 2
assert dataset.features() == {"response", "user_input", "retrieved_contexts"}
assert len(dataset) == 2
assert dataset[0] == single_turn_sample
This ensures that retrieved_contexts
is a list of strings, which is valid for SingleTurnSample
[1].
To continue talking to Dosu, mention @dosu.
I have a dataframe with four questions and answers, context and ground_truths to each. So why doesn't it work like we have in the docs?
Your dataframe might not be working as expected with the evaluate
method because it is missing required keys. According to the _validate_langsmith_eval
method, the dataframe must include the keys "question" and "ground_truth" in example.inputs
and example.outputs
, respectively. Additionally, the run.outputs
must include the keys "answer" and "contexts" [1].
Ensure that your dataframe includes these required keys to work correctly with the evaluate
method. Here is an example of how to set up and evaluate a QA chain:
from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import (
faithfulness,
answer_relevancy,
context_precision,
context_recall,
)
# create evaluation chains
faithfulness_chain = RagasEvaluatorChain(metric=faithfulness)
answer_rel_chain = RagasEvaluatorChain(metric=answer_relevancy)
context_rel_chain = RagasEvaluatorChain(metric=context_precision)
context_recall_chain = RagasEvaluatorChain(metric=context_recall)
# Evaluate a list of inputs/queries and the outputs/predictions from the QA chain.
predictions = qa_chain.batch(examples)
# evaluate
print("evaluating...")
r = faithfulness_chain.evaluate(examples, predictions)
r
Additionally, ensure that the fields in your dataframe have the following data types:
question
: list of stringscontexts
: list of lists of stringsanswer
: list of stringsground_truth
: list of lists of strings [2].Make sure your dataframe aligns with the expected structure and includes the necessary keys for evaluation.
To continue talking to Dosu, mention @dosu.
Thank you. I just needed to wrap each of my context in a list, making it all a list containing lists. How do I write a python code to transform list into lists though?
You can transform a list of contexts into a list of lists using a simple list comprehension. Here's an example:
# Original list of contexts
contexts = ["context1", "context2", "context3"]
# Transform into a list of lists
contexts_list_of_lists = [[context] for context in contexts]
print(contexts_list_of_lists)
# Output: [['context1'], ['context2'], ['context3']]
If you have a dataframe and you need to apply this transformation to a specific column, you can use the apply
method:
import pandas as pd
# Example dataframe
data = {
'question': ["What is X?", "What is Y?"],
'contexts': ["context1", "context2"],
'answer': ["Answer1", "Answer2"],
'ground_truth': ["ground_truth1", "ground_truth2"]
}
df = pd.DataFrame(data)
# Transform the 'contexts' column into a list of lists
df['contexts'] = df['contexts'].apply(lambda x: [x])
print(df)
This will ensure that each context is wrapped in a list, making it compatible with the expected input format for the evaluate
method.
To continue talking to Dosu, mention @dosu.
Hi @Simeon-Krah : Did this help? I tried doing below:
eval_data = {"questions": QUESTIONS, "ground_truth": GROUND_TRUTH, "contexts": contexts, "answers": answers} df_eval = pd.DataFrame(eval_data) df_eval['contexts'] = df_eval['contexts'].apply(lambda x: [x]) df_eval['ground_truth'] = df_eval['ground_truth'].apply(lambda x: [x]) df_eval_st = df_eval.astype(str)
dataset = Dataset.from_pandas(df_eval_st)
But getting the same error: value is not a valid list (type=type_error.list)
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug I am using langchain for my agent. I have been able to implement the system that I get the answers and contexts for my inputs which I have added to the ground truths to create a whole dataframe. I have created an eval set using the from_pandas method which works fine. But when I run the evaluate method, I get the error:
ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)
Ragas version: 0.1.18 Python version: 3.10.12
Code to Reproduce from datasets import Dataset eval_dataset = Dataset.from_pandas(df) eval_dataset
error side
scores = evaluate( eval_dataset, metrics=[context_recall, context_precision, faithfulness, answer_correctness], llm = llm, embeddings=GoogleGenerativeAIEmbeddings(google_api_key="changedtoprotectmyapiFcQZJa-E-QJOxyjK_X8rpBkIQ", model="models/embedding-001"), )
scores
Error trace
339 values, fields_set, validation_error = validate_model(pydantic_self.class__, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(pydantic_self, 'dict__', values)
ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)
Expected behavior I wanted it run smoothly, performing the evaluation for me.
Additional context So yeah, every side of my code runs, making me create a Dataset({ features: ['question', 'contexts', 'answer', 'reference'], num_rows: 4
But running the evaluate method brings the issues up there })