Simeon-Krah commented 1 month ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I am using langchain for my agent. I have been able to implement the system that I get the answers and contexts for my inputs which I have added to the ground truths to create a whole dataframe. I have created an eval set using the from_pandas method which works fine. But when I run the evaluate method, I get the error:

ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)

Ragas version: 0.1.18 Python version: 3.10.12

Code to Reproduce from datasets import Dataset eval_dataset = Dataset.from_pandas(df) eval_dataset

error side

scores = evaluate( eval_dataset, metrics=[context_recall, context_precision, faithfulness, answer_correctness], llm = llm, embeddings=GoogleGenerativeAIEmbeddings(google_api_key="changedtoprotectmyapiFcQZJa-E-QJOxyjK_X8rpBkIQ", model="models/embedding-001"), )

scores

Error trace

339 values, fields_set, validation_error = validate_model(pydantic_self.class__, data) 340 if validation_error: --> 341 raise validation_error 342 try: 343 object_setattr(pydantic_self, 'dict__', values)

ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)

Expected behavior I wanted it run smoothly, performing the evaluation for me.

Additional context So yeah, every side of my code runs, making me create a Dataset({ features: ['question', 'contexts', 'answer', 'reference'], num_rows: 4

But running the evaluate method brings the issues up there })

dosubot[bot] commented 1 month ago

Hey @Simeon-Krah! I'm here to help you with bugs, questions, and more. Let's tackle this issue together!

I found a similar issue that might help you resolve the ValidationError: "value is not a valid list" for SingleTurnSample when running the evaluate method. The SingleTurnSample expects user_input to be a string, not a list. Ensure that the retrieved_contexts field is a list of strings.

Here is an example of how to correctly create a SingleTurnSample:

from ragas.dataset_schema import SingleTurnSample

# Correct usage for SingleTurnSample
single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts=["context1", "context2"])

Ensure that when you create instances of SingleTurnSample, the retrieved_contexts is always a list of strings. If you mistakenly provide a different type, it will raise the ValidationError.

For example, this will raise an error:

# Incorrect usage for SingleTurnSample
single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts="context1")

Make sure your code aligns with the correct usage as shown in the tests:

def test_evaluation_dataset():
    single_turn_sample = SingleTurnSample(user_input="What is X", response="Y", retrieved_contexts=["context1", "context2"])

    dataset = EvaluationDataset(samples=[single_turn_sample, single_turn_sample])

    hf_dataset = dataset.to_hf_dataset()

    assert dataset.get_sample_type() == SingleTurnSample
    assert len(hf_dataset) == 2
    assert dataset.features() == {"response", "user_input", "retrieved_contexts"}
    assert len(dataset) == 2
    assert dataset[0] == single_turn_sample

This ensures that retrieved_contexts is a list of strings, which is valid for SingleTurnSample ^[1].

To continue talking to Dosu, mention @dosu.

Simeon-Krah commented 1 month ago

I have a dataframe with four questions and answers, context and ground_truths to each. So why doesn't it work like we have in the docs?

dosubot[bot] commented 1 month ago

Your dataframe might not be working as expected with the evaluate method because it is missing required keys. According to the _validate_langsmith_eval method, the dataframe must include the keys "question" and "ground_truth" in example.inputs and example.outputs, respectively. Additionally, the run.outputs must include the keys "answer" and "contexts" ^[1].

Ensure that your dataframe includes these required keys to work correctly with the evaluate method. Here is an example of how to set up and evaluate a QA chain:

from ragas.langchain.evalchain import RagasEvaluatorChain
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,
    context_recall,
)

# create evaluation chains
faithfulness_chain = RagasEvaluatorChain(metric=faithfulness)
answer_rel_chain = RagasEvaluatorChain(metric=answer_relevancy)
context_rel_chain = RagasEvaluatorChain(metric=context_precision)
context_recall_chain = RagasEvaluatorChain(metric=context_recall)

# Evaluate a list of inputs/queries and the outputs/predictions from the QA chain.
predictions = qa_chain.batch(examples)

# evaluate
print("evaluating...")
r = faithfulness_chain.evaluate(examples, predictions)
r

Additionally, ensure that the fields in your dataframe have the following data types:

question: list of strings
contexts: list of lists of strings
answer: list of strings
ground_truth: list of lists of strings ^[2].

Make sure your dataframe aligns with the expected structure and includes the necessary keys for evaluation.

To continue talking to Dosu, mention @dosu.

Simeon-Krah commented 1 month ago

Thank you. I just needed to wrap each of my context in a list, making it all a list containing lists. How do I write a python code to transform list into lists though?

dosubot[bot] commented 1 month ago

You can transform a list of contexts into a list of lists using a simple list comprehension. Here's an example:

# Original list of contexts
contexts = ["context1", "context2", "context3"]

# Transform into a list of lists
contexts_list_of_lists = [[context] for context in contexts]

print(contexts_list_of_lists)
# Output: [['context1'], ['context2'], ['context3']]

If you have a dataframe and you need to apply this transformation to a specific column, you can use the apply method:

import pandas as pd

# Example dataframe
data = {
    'question': ["What is X?", "What is Y?"],
    'contexts': ["context1", "context2"],
    'answer': ["Answer1", "Answer2"],
    'ground_truth': ["ground_truth1", "ground_truth2"]
}

df = pd.DataFrame(data)

# Transform the 'contexts' column into a list of lists
df['contexts'] = df['contexts'].apply(lambda x: [x])

print(df)

This will ensure that each context is wrapped in a list, making it compatible with the expected input format for the evaluate method.

To continue talking to Dosu, mention @dosu.

ahmer09 commented 1 month ago

Hi @Simeon-Krah : Did this help? I tried doing below:

eval_data = {"questions": QUESTIONS, "ground_truth": GROUND_TRUTH, "contexts": contexts, "answers": answers} df_eval = pd.DataFrame(eval_data) df_eval['contexts'] = df_eval['contexts'].apply(lambda x: [x]) df_eval['ground_truth'] = df_eval['ground_truth'].apply(lambda x: [x]) df_eval_st = df_eval.astype(str)

RAGAS expects a Dataset object

dataset = Dataset.from_pandas(df_eval_st)

But getting the same error: value is not a valid list (type=type_error.list)

explodinggradients / ragas

ValidationError #1317

error side

RAGAS expects a Dataset object