Closed Mervyn-1 closed 3 weeks ago
Hey @Mervyn-1! 👋 I'm here to help you with any bugs, questions, or contributions you have while waiting for a human maintainer. Let's squash those bugs together!
The issue you're encountering is related to the handling of asyncio event loops in a multi-threaded environment. To resolve this, you need to ensure that the event loop is properly managed and that coroutines are awaited in the correct loop. Here is a modified version of your RagEvaluator
class to handle the event loop correctly:
import asyncio
import pandas as pd
from ragas import Dataset, evaluate, answer_correctness
from ragas.llm import ChatOpenAI
from ragas.embeddings import EmbeddingFactory
class RagEvaluator:
def __init__(self):
self.llm = ChatOpenAI(model="deepseek-chat", api_key="", base_url="https://api.deepseek.com/v1", temperature=0.7)
self.wrapped_embeddings = EmbeddingFactory.build_embedding_eas_service()
async def evaluate_rag_async(self, question, answer, ground_truth):
data_samples = {
'question': question,
'answer': answer,
'ground_truth': ground_truth
}
dataset = Dataset.from_dict(data_samples)
result = await evaluate(dataset, metrics=[answer_correctness], llm=self.llm, embeddings=self.wrapped_embeddings)
return result
def evaluate_rag(self, question, answer, ground_truth):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
result = loop.run_until_complete(self.evaluate_rag_async(question, answer, ground_truth))
loop.close()
return result
df = pd.read_csv('/Users/.../merged_results.csv', encoding='utf-8')
question = df['question'].tolist()
answer = df['answer'].tolist()
ground_truth = df['ground_truth'].tolist()
evaluator = RagEvaluator()
result = evaluator.evaluate_rag(question[:15], answer[:15], ground_truth[:15])
print(result)
result2 = evaluator.evaluate_rag(question[15:], answer[15:], ground_truth[15:])
print(result2)
In this modified version, the evaluate_rag
method creates a new event loop for each evaluation and ensures that the asynchronous evaluate_rag_async
method is run within that loop. This should prevent the RuntimeError
related to event loops from occurring [1].
To continue talking to Dosu, mention @dosu.
@Mervyn-1 are you running this as a script or in jupyter notebook or any other environment?
@Mervyn-1 are you running this as a script or in jupyter notebook or any other environment?
as a script
same here, I roll back to version 0.1.10 and it works fine
@jjmachan @KendrickChou I switched to Python version 3.10.14, and the bug no longer occurs. Could it be due to a specific feature in python 3.10?
@KendrickChou are you using it as a script too? and which python version
@Mervyn-1 that really helps ❤️ . I was not able to find anything specific in the python docs that changed with 3.10 but let me take a closer look
Switching to 3.10.15 fixed issue for me as well.
I'm seeing this issue with Ragas 0.1.20 and Python 3.12
hey @ckevinhill @bsbodden could you help me out here, I'm trying to reproduce this issue but can't get it
using ragas 0.1.20 python: 3.9 and 3.12
code
from datasets import load_dataset, DatasetDict
import typing as t
from ragas import evaluate
from ragas.metrics import answer_relevancy, context_precision, faithfulness
from ragas.metrics._aspect_critic import harmfulness
ds = t.cast(DatasetDict, load_dataset("explodinggradients/fiqa", "ragas_eval"))[
"baseline"
]
result = evaluate(
ds.select(range(2)),
metrics=[answer_relevancy, faithfulness, harmfulness],
)
assert result is not None
is there anything that I'm missing here?
Closing after 8 days of waiting for the additional info requested.
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
I encountered this issue during batch evaluation: when the batch list exceeds a certain length (15), the following errors occur. Even after completing the first evaluation loop, the second loop triggers the error:
Ragas version:0.1.12.dev3+g95d8318 Python version:3.9.19
Code to Reproduce
Error trace
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
R-291