explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.06k stars 714 forks source link

[R-291] RuntimeError: There is no current event loop in thread 'MainThread'. #1136

Closed Mervyn-1 closed 3 weeks ago

Mervyn-1 commented 3 months ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

RuntimeError: There is no current event loop in thread 'MainThread'.
sys:1: RuntimeWarning: coroutine 'Executor.wrap_callable_with_index.<locals>.wrapped_callable_async' was never awaited

I encountered this issue during batch evaluation: when the batch list exceeds a certain length (15), the following errors occur. Even after completing the first evaluation loop, the second loop triggers the error:

RuntimeError: Task <Task pending name='Task-31' coro=<as_completed.<locals>.sema_coro() running at /Users/.../python3.9/site-packages/ragas/executor.py:33> cb=[as_completed.<locals>._on_completion() at /Users/.../python3.9/asyncio/tasks.py:598]> got Future <Future pending> attached to a different loop

RuntimeError: There is no current event loop in thread 'MainThread'. sys:1: RuntimeWarning: coroutine 'Executor.wrap_callable_with_index.<locals>.wrapped_callable_async' was never awaited

Ragas version:0.1.12.dev3+g95d8318 Python version:3.9.19

Code to Reproduce

class RagEvaluator:
    def __init__(self):
        self.llm = ChatOpenAI(model="deepseek-chat",api_key="",base_url="https://api.deepseek.com/v1" ,temperature=0.7)
        self.wrapped_embeddings = EmbeddingFactory.build_embedding_eas_service()
    def evaluate_rag(self,question, answer,ground_truth):
        data_samples = {
            'question': question,
            'answer': answer,
            'ground_truth': ground_truth
        }
        dataset = Dataset.from_dict(data_samples)
        result = evaluate(dataset, metrics=[answer_correctness], llm=self.llm, embeddings=self.wrapped_embeddings)
        return result

df = pd.read_csv('/Users/.../merged_results.csv', encoding='utf-8')
question = df['question'].tolist()
answer = df['answer'].tolist()
ground_truth = df['ground_truth'].tolist()

data = {
    'question': question,
    'answer': answer,
    'ground_truth': ground_truth
}

dataset = Dataset.from_dict(data)
evaluator = RagEvaluator()

result = evaluator.evaluate_rag(question[:15],answer[:15],ground_truth[:15])
print(result)

result2 = evaluator.evaluate_rag(question[15:],answer[15:],ground_truth[15:])

Error trace

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

R-291

dosubot[bot] commented 3 months ago

Hey @Mervyn-1! 👋 I'm here to help you with any bugs, questions, or contributions you have while waiting for a human maintainer. Let's squash those bugs together!

The issue you're encountering is related to the handling of asyncio event loops in a multi-threaded environment. To resolve this, you need to ensure that the event loop is properly managed and that coroutines are awaited in the correct loop. Here is a modified version of your RagEvaluator class to handle the event loop correctly:

import asyncio
import pandas as pd
from ragas import Dataset, evaluate, answer_correctness
from ragas.llm import ChatOpenAI
from ragas.embeddings import EmbeddingFactory

class RagEvaluator:
    def __init__(self):
        self.llm = ChatOpenAI(model="deepseek-chat", api_key="", base_url="https://api.deepseek.com/v1", temperature=0.7)
        self.wrapped_embeddings = EmbeddingFactory.build_embedding_eas_service()

    async def evaluate_rag_async(self, question, answer, ground_truth):
        data_samples = {
            'question': question,
            'answer': answer,
            'ground_truth': ground_truth
        }
        dataset = Dataset.from_dict(data_samples)
        result = await evaluate(dataset, metrics=[answer_correctness], llm=self.llm, embeddings=self.wrapped_embeddings)
        return result

    def evaluate_rag(self, question, answer, ground_truth):
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        result = loop.run_until_complete(self.evaluate_rag_async(question, answer, ground_truth))
        loop.close()
        return result

df = pd.read_csv('/Users/.../merged_results.csv', encoding='utf-8')
question = df['question'].tolist()
answer = df['answer'].tolist()
ground_truth = df['ground_truth'].tolist()

evaluator = RagEvaluator()

result = evaluator.evaluate_rag(question[:15], answer[:15], ground_truth[:15])
print(result)

result2 = evaluator.evaluate_rag(question[15:], answer[15:], ground_truth[15:])
print(result2)

In this modified version, the evaluate_rag method creates a new event loop for each evaluation and ensures that the asynchronous evaluate_rag_async method is run within that loop. This should prevent the RuntimeError related to event loops from occurring [1].

To continue talking to Dosu, mention @dosu.

jjmachan commented 3 months ago

@Mervyn-1 are you running this as a script or in jupyter notebook or any other environment?

Mervyn-1 commented 3 months ago

@Mervyn-1 are you running this as a script or in jupyter notebook or any other environment?

as a script

KendrickChou commented 3 months ago

same here, I roll back to version 0.1.10 and it works fine

Mervyn-1 commented 3 months ago

@jjmachan @KendrickChou I switched to Python version 3.10.14, and the bug no longer occurs. Could it be due to a specific feature in python 3.10?

jjmachan commented 3 months ago

@KendrickChou are you using it as a script too? and which python version

@Mervyn-1 that really helps ❤️ . I was not able to find anything specific in the python docs that changed with 3.10 but let me take a closer look

ckevinhill commented 1 month ago

Switching to 3.10.15 fixed issue for me as well.

bsbodden commented 1 month ago

I'm seeing this issue with Ragas 0.1.20 and Python 3.12

jjmachan commented 4 weeks ago

hey @ckevinhill @bsbodden could you help me out here, I'm trying to reproduce this issue but can't get it

using ragas 0.1.20 python: 3.9 and 3.12

code

from datasets import load_dataset, DatasetDict
import typing as t

from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness
from ragas.metrics._aspect_critic import harmfulness

ds = t.cast(DatasetDict, load_dataset("explodinggradients/fiqa", "ragas_eval"))[
    "baseline"
]
result = evaluate(
    ds.select(range(2)),
    metrics=[answer_relevancy, faithfulness, harmfulness],
)
assert result is not None

is there anything that I'm missing here?

github-actions[bot] commented 3 weeks ago

Closing after 8 days of waiting for the additional info requested.