explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.58k stars 646 forks source link

Not able to calculate noise_sensitivity_relevant using Azure Open AI. #1283

Open wanjeakshay opened 5 days ago

wanjeakshay commented 5 days ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I am able to calculate 4 metrics {'faithfulness': 1.0000, 'answer_relevancy': 0.9880, 'context_recall': 1.0000, 'context_precision': 1.0000} but when I tried to calculate noise_sensitivity_relevant it is saying "ground_truth" column is not present but I have ground_truth column in my dataset

Ragas version: 0.1.18 Python version: 3.10.12

Code to Reproduce from ragas.metrics import noise_sensitivity_relevant metrics = [ noise_sensitivity_relevant, ] print(dataset) result_2 = evaluate( dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings )

Error trace Dataset({ features: ['question', 'ground_truth', 'answer', 'contexts', 'retreived_contexts'], num_rows: 1 })

ValueError Traceback (most recent call last) in <cell line: 6>() 4 ] 5 print(dataset) ----> 6 result_2 = evaluate( 7 dataset, metrics=metrics, llm=azure_model, embeddings=azure_embeddings 8 )

2 frames /usr/local/lib/python3.10/dist-packages/ragas/_analytics.py in wrapper(*args, kwargs) 127 def wrapper(*args: P.args, *kwargs: P.kwargs) -> t.Any: 128 track(IsCompleteEvent(event_type=func.name, is_completed=False)) --> 129 result = func(args, kwargs) 130 track(IsCompleteEvent(event_type=func.name, is_completed=True)) 131

/usr/local/lib/python3.10/dist-packages/ragas/evaluation.py in evaluate(dataset, metrics, llm, embeddings, callbacks, in_ci, run_config, token_usage_parser, raise_exceptions, column_map) 175 176 if isinstance(dataset, EvaluationDataset): --> 177 validate_required_columns(dataset, metrics) 178 validate_supported_metrics(dataset, metrics) 179

/usr/local/lib/python3.10/dist-packages/ragas/validation.py in validate_required_columns(ds, metrics) 60 available_columns = ds.features() 61 if not required_columns.issubset(available_columns): ---> 62 raise ValueError( 63 f"The metric [{m.name}] that that is used requires the following " 64 f"additional columns {list(required_columns - available_columns)} "

ValueError: The metric [noise_sensitivity_relevant] that that is used requires the following additional columns ['ground_truth'] to be present in the dataset.

Expected behavior It should return the result

Additional context Add any other context about the problem here.

dosubot[bot] commented 5 days ago

Hey @wanjeakshay! I'm here to help you with your bug. I'm a bot that can assist you with bug fixes, answer questions, and guide you to become a contributor.

To resolve the ValueError indicating that the 'ground_truth' column is not present when calculating noise_sensitivity_relevant, ensure that your dataset includes the 'ground_truth' column and that it is correctly referenced in the code. The NoiseSensitivity class requires the 'ground_truth' column as part of its _required_columns for MetricType.SINGLE_TURN.

Here is the relevant part of the code that checks for the required columns:

@dataclass
class NoiseSensitivity(MetricWithLLM, SingleTurnMetric):
    name: str = "noise_sensitivity"  # type: ignore
    focus: str = "relevant"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "response",
                "ground_truth",
                "retrieved_contexts",
            }
        }
    )
    ...

Make sure your dataset includes the 'ground_truth' column and that it is correctly passed to the NoiseSensitivity class. If the column is present in your dataset but still causing issues, verify that the data is being correctly loaded and passed to the class methods.

Here is an example of how you might structure your dataset:

dataset = [
    {
        "user_input": "What is the capital of France?",
        "response": "The capital of France is Paris.",
        "ground_truth": "Paris is the capital of France.",
        "retrieved_contexts": ["Paris is the capital and most populous city of France."]
    },
    # Add more samples as needed
]

Ensure that when you create an instance of NoiseSensitivity and call its methods, the dataset rows include the 'ground_truth' key:

noise_sensitivity_relevant = NoiseSensitivity()

# Example of calling the method with a sample row
sample_row = dataset[0]
score = await noise_sensitivity_relevant._ascore(sample_row, callbacks)

Additionally, you can refer to the following example from the ragas documentation to ensure your dataset is correctly formatted and used:

from datasets import Dataset 
from ragas.metrics import noise_sensitivity_relevant, noise_sensitivity_irrelevant
from ragas import evaluate

data_sample = {
    "question": ["What is the Life Insurance Corporation of India (LIC) known for?"],
    "ground_truth": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, established in 1956 through the nationalization of the insurance industry. It is known for managing a large portfolio of investments."],
    "answer": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, known for its vast portfolio of investments. LIC contributs to the financial stability of the country."],
    "contexts": [[
        "The Life Insurance Corporation of India (LIC) was established in 1956 following the nationalization of the insurance industry in India.",
        "LIC is the largest insurance company in India, with a vast network of policyholders and a huge investments.",
        "As the largest institutional investor in India, LIC manages a substantial funds, contributing to the financial stability of the country.",
        "The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc"
    ]]
}

dataset = Dataset.from_dict(data_sample)
metrics = [noise_sensitivity_relevant, noise_sensitivity_irrelevant]
score = evaluate(dataset, metrics=metrics)
score.to_pandas()

By ensuring the 'ground_truth' column is present and correctly referenced, you should be able to resolve the ValueError [1][2].

To continue talking to Dosu, mention @dosu.

wanjeakshay commented 5 days ago

This is my dataset looks like Dataset({ features: ['user_input', 'response', 'ground_truth', 'retrieved_contexts'], num_rows: 1 })

The error I am getting,

ValueError Traceback (most recent call last) in <cell line: 6>() 4 ] 5 print(dataset1) ----> 6 result_2 = evaluate( 7 dataset1, metrics=metrics, llm=azure_model, embeddings=azure_embeddings 8 )

2 frames /usr/local/lib/python3.10/dist-packages/ragas/validation.py in validate_required_columns(ds, metrics) 60 available_columns = ds.features() 61 if not required_columns.issubset(available_columns): ---> 62 raise ValueError( 63 f"The metric [{m.name}] that that is used requires the following " 64 f"additional columns {list(required_columns - available_columns)} "

ValueError: The metric [noise_sensitivity_relevant] that that is used requires the following additional columns ['ground_truth'] to be present in the dataset.

dosubot[bot] commented 5 days ago

Ensure that the column names in your dataset exactly match the required names. The noise_sensitivity_relevant metric requires the columns user_input, response, ground_truth, and retrieved_contexts.

Here is an example of how to structure your dataset correctly:

from datasets import Dataset 
from ragas.metrics import noise_sensitivity_relevant, noise_sensitivity_irrelevant
from ragas import evaluate

data_sample = {
    "user_input": ["What is the Life Insurance Corporation of India (LIC) known for?"],
    "ground_truth": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, established in 1956 through the nationalization of the insurance industry. It is known for managing a large portfolio of investments."],
    "response": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, known for its vast portfolio of investments. LIC contributes to the financial stability of the country."],
    "retrieved_contexts": [[
        "The Life Insurance Corporation of India (LIC) was established in 1956 following the nationalization of the insurance industry in India.",
        "LIC is the largest insurance company in India, with a vast network of policyholders and huge investments.",
        "As the largest institutional investor in India, LIC manages substantial funds, contributing to the financial stability of the country.",
        "The Indian economy is one of the fastest-growing major economies in the world, thanks to sectors like finance, technology, manufacturing, etc."
    ]]
}

dataset = Dataset.from_dict(data_sample)
metrics = [noise_sensitivity_relevant, noise_sensitivity_irrelevant]
score = evaluate(dataset, metrics=metrics)
print(score.to_pandas())

Ensure that the column names in your dataset are exactly user_input, response, ground_truth, and retrieved_contexts without any typos or case differences [1][2].

To continue talking to Dosu, mention @dosu.