explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.12k stars 723 forks source link

RAGAS TimeoutError Exception while Evaluation #1280

Open kaan9700 opened 1 month ago

kaan9700 commented 1 month ago

I have a self-created data set with questions, GT, contexts and answers and have started the evaluation with the RAGAS evaluate() method. The percentage increases a few points and then stops. Then I get several identical errors. Then the evaluation continues and freezes again after a few minutes.

Ragas version: 0.1.13 Python version: 3.10.13

Code to Reproduce

import pandas as pd
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
    context_utilization,
    context_entity_recall,
    answer_correctness,
    answer_similarity,
)
from ragas.metrics.critique import harmfulness
from dotenv import load_dotenv

load_dotenv()

df = pd.read_excel('testset_full-200-ES.xlsx')

# Create the 'data_samples' dictionary structure
data_samples = {
    'question': df['question'].tolist(),
    'answer': df['answer'].tolist(),
    'contexts': df['contexts'].apply(lambda x: [x] if pd.notna(x) else []).tolist(),
    'ground_truth': df['ground_truth'].tolist()
}
dataset = Dataset.from_dict(data_samples)

result = evaluate(
    dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
        context_utilization,
        context_entity_recall,
        answer_correctness,
        answer_similarity,
        harmfulness,
    ],
)

df = result.to_pandas()
print(result)
# save evaluation results to csv
df.to_csv('results_es.csv', index=False)

Error trace

Evaluating:  11%|█         | 1709/15453 [14:40<330:40:17, 86.61s/it]Exception raised in Job[13246]: TimeoutError()
Exception raised in Job[13243]: TimeoutError()
Exception raised in Job[5118]: TimeoutError()
Exception raised in Job[13245]: TimeoutError()
Exception raised in Job[5125]: TimeoutError()
Exception raised in Job[13241]: TimeoutError()
Exception raised in Job[5126]: TimeoutError()
Exception raised in Job[13227]: TimeoutError()
Exception raised in Job[5122]: TimeoutError()
Exception raised in Job[13244]: TimeoutError()
Exception raised in Job[13240]: TimeoutError()
Exception raised in Job[13236]: TimeoutError()
Exception raised in Job[5113]: TimeoutError()
Exception raised in Job[5124]: TimeoutError()
Exception raised in Job[14126]: TimeoutError()
Evaluating:  13%|█▎        | 2063/15453 [33:06<750:37:32, 201.81s/it]Exception raised in Job[8734]: TimeoutError()
Exception raised in Job[8731]: TimeoutError()
Exception raised in Job[4239]: TimeoutError()
Exception raised in Job[8727]: TimeoutError()
Exception raised in Job[1626]: TimeoutError()
Exception raised in Job[12353]: TimeoutError()
Exception raised in Job[4231]: TimeoutError()
Exception raised in Job[12354]: TimeoutError()
Exception raised in Job[4235]: TimeoutError()
Exception raised in Job[12358]: TimeoutError()
Exception raised in Job[4236]: TimeoutError()
Exception raised in Job[8732]: TimeoutError()
Exception raised in Job[1625]: TimeoutError()
Exception raised in Job[1630]: TimeoutError()
Exception raised in Job[1617]: TimeoutError()
dosubot[bot] commented 1 month ago

Hey @kaan9700! How can I assist you today?

I found some similar issues that might help you resolve the TimeoutError when using the evaluate() function in the ragas library.

  1. Timeout Configuration: You need to set the timeout parameter in the RunConfig and ensure it is passed to the evaluation function. Here is an example of how to do it:

    from ragas import evaluate, RunConfig
    from ragas.metrics import (
       answer_relevancy,
       faithfulness,
       context_recall,
       context_precision,
       context_utilization,
       context_entity_recall,
       answer_correctness,
       answer_similarity,
    )
    from ragas.metrics.critique import harmfulness
    from datasets import Dataset
    import pandas as pd
    
    # Load your dataset
    df = pd.read_excel('testset_full-200-ES.xlsx')
    data_samples = {
       'question': df['question'].tolist(),
       'answer': df['answer'].tolist(),
       'contexts': df['contexts'].apply(lambda x: [x] if pd.notna(x) else []).tolist(),
       'ground_truth': df['ground_truth'].tolist()
    }
    dataset = Dataset.from_dict(data_samples)
    
    # Create a RunConfig with a specified timeout
    run_config = RunConfig(timeout=120)  # Set timeout to 120 seconds
    
    # Use the evaluate function with the custom run_config
    result = evaluate(
       dataset,
       metrics=[
           context_precision,
           faithfulness,
           answer_relevancy,
           context_recall,
           context_utilization,
           context_entity_recall,
           answer_correctness,
           answer_similarity,
           harmfulness,
       ],
       run_config=run_config
    )
    
    df = result.to_pandas()
    print(result)
    df.to_csv('results_es.csv', index=False)
  2. Known Issues and Limitations: There are known issues with using certain configurations in the ragas library, such as the lack of support for asynchronous callbacks and specific configurations required for different types of models [1].

  3. Debugging and Logging: Enable debugging and logging options to diagnose the TimeoutError. Set the log_tenacity parameter in RunConfig to True and enable the RAGAS_DEBUG environment variable:

    import os
    from ragas import RunConfig
    
    os.environ["RAGAS_DEBUG"] = "true"
    
    run_config = RunConfig(timeout=120, log_tenacity=True)

    This will provide comprehensive logging and debugging information [1].

By following these steps, you should be able to resolve the TimeoutError and successfully use the evaluate() function with your dataset.

To continue talking to Dosu, mention @dosu.

mmiumiu commented 1 month ago

The same error. Have you fixed it?

CraigMChen commented 1 week ago

+1

jjmachan commented 1 week ago

@CraigMChen does it happen with timeout too?

CraigMChen commented 1 week ago

@jjmachan

Here's my code:

from ragas.llms import LangchainLLMWrapper
from ragas import evaluate
from langchain.chat_models import ChatOllama
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
)

llm = ChatOllama(model="llama3.2", base_url="http://127.0.0.1:11434", request_timeout=3000)
print(llm.invoke("hello")) # ok

result = evaluate(
    llm=LangchainLLMWrapper(llm),
    dataset=dataset,
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
)

df = result.to_pandas()

Error:

Exception raised in Job[0]: TimeoutError()
Exception raised in Job[3]: TimeoutError()
Exception raised in Job[2]: TimeoutError()
Exception raised in Job[1]: TimeoutError()