explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.79k stars 669 forks source link

Random RuntimeError: Tool context error detected. This can occur due to parallelization in VertexAI #957

Open franck-cussac opened 4 months ago

franck-cussac commented 4 months ago

Describe the bug I run evaluation, sometimes it works sometimes it fails with : RuntimeError: Tool context error detected. This can occur due to parallelization

Ragas version: 0.1.7 Python version: 3.10.12

Code to Reproduce

def evaluation(answers: Dataset) -> pd.DataFrame:
vertextai_llm = ChatVertexAI(
model_name=MODEL_GEMINI,
)
vertextai_embeddings = VertexAIEmbeddings(
model_name=MODEL_EMBEDDING,
)

return evaluate(
answers,
metrics=[
faithfulness,
answer_relevancy,
context_recall,
context_precision,
harmfulness,
answer_similarity,
answer_correctness,
],
llm=vertextai_llm,
embeddings=vertextai_embeddings,
).to_pandas()

Error trace

Evaluating:   3%|████▊                                                                                                                                                                      | 1/36 [00:03<02:02,  3.49s/it]
Exception in thread Thread-12:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/lib/python3.10/site-packages/ragas/executor.py", line 96, in run
results = self.loop.run_until_complete(self._aresults())
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/lib/python3.10/site-packages/ragas/executor.py", line 84, in _aresults
raise e
File "/lib/python3.10/site-packages/ragas/executor.py", line 79, in _aresults
r = await future
File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result()  # May raise f.exception().
File "/lib/python3.10/site-packages/ragas/executor.py", line 38, in sema_coro
return await coro
File "/home/lib/python3.10/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
return counter, await callable(*args, **kwargs)
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 116, in ascore
raise e
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 112, in ascore
score = await self._ascore(row=row, callbacks=group_cm, is_async=is_async)
File "/lib/python3.10/site-packages/ragas/metrics/_answer_relevance.py", line 167, in _ascore
return self._calculate_score(answers, row)
File "/lib/python3.10/site-packages/ragas/metrics/_answer_relevance.py", line 139, in _calculate_score
cosine_sim = self.calculate_similarity(question, gen_questions)
File "/lib/python3.10/site-packages/ragas/metrics/_answer_relevance.py", line 115, in calculate_similarity
self.embeddings.embed_documents(generated_questions)
File "/lib/python3.10/site-packages/ragas/embeddings/base.py", line 58, in embed_documents
return self.embeddings.embed_documents(texts)
File "/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 379, in embed_documents
return self.embed(texts, batch_size, "RETRIEVAL_DOCUMENT")
File "/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 362, in embed
embeddings.extend(t.result())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/lib/python3.10/site-packages/langchain_google_vertexai/embeddings.py", line 193, in _get_embeddings_with_retry
with telemetry.tool_context_manager(self._user_agent):
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 48, in tool_context_manager
_pop_tool_name(tool_name)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 57, in _pop_tool_name
raise RuntimeError(
RuntimeError: Tool context error detected. This can occur due to parallelization.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/src/evaluation/main.py", line 47, in main
df = evaluation(dataset)
File "/src/evaluation/main.py", line 59, in evaluation
return evaluate(
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 231, in evaluate
raise e
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 213, in evaluate
raise ExceptionInRunner()
ragas.exceptions.ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass `raise_exceptions=False` incase you want to show only a warning message instead.
sys:1: RuntimeWarning: coroutine 'Executor.wrap_callable_with_index.<locals>.wrapped_callable_async' was never awaited
Task was destroyed but it is pending!
task: <Task pending name='Task-4' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-2' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-21' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-18' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-12' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-10' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-9' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-8' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-7' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-5' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-3' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-19' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-17' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-13' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-11' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>
Task was destroyed but it is pending!
task: <Task pending name='Task-20' coro=<as_completed.<locals>.sema_coro() running at /lib/python3.10/site-packages/ragas/executor.py:38> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.10/asyncio/futures.py:385, Task.task_wakeup()]> cb=[as_completed.<locals>._on_completion() at /usr/lib/python3.10/asyncio/tasks.py:558]>

Expected behavior An evaluation working 100% of time

Additional context I'm using vertex AI and I follow the given notebook in example.

emmaebrl commented 4 months ago

I encountered the same error and figured out that it occurs solely due to the "answer_relevancy" metric. I have no issues with any other metrics.

franck-cussac commented 4 months ago

I test without and I still have an error :

Creating json from Arrow format: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 58.06ba/s]
Evaluating:   9%|███████████████▉                                                                                                                                                                | 202/2232 [01:45<10:10,  3.33it/s]Failed to parse output. Returning None.
Evaluating:   9%|████████████████▋                                                                                                                                                               | 211/2232 [01:49<20:42,  1.63it/s]Failed to parse output. Returning None.
Evaluating:  11%|███████████████████▏                                                                                                                                                            | 243/2232 [01:59<11:26,  2.90it/s]Failed to parse output. Returning None.
Evaluating:  12%|█████████████████████▌                                                                                                                                                          | 273/2232 [02:10<12:32,  2.60it/s]Failed to parse output. Returning None.
Evaluating:  13%|██████████████████████▏                                                                                                                                                         | 282/2232 [02:12<05:35,  5.82it/s]Failed to parse output. Returning None.
Evaluating:  59%|███████████████████████████████████████████████████████████████████████████████████████████████████████                                                                        | 1314/2232 [09:25<06:35,  2.32it/s]
Exception in thread Thread-376:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/lib/python3.10/site-packages/ragas/executor.py", line 96, in run
results = self.loop.run_until_complete(self._aresults())
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/lib/python3.10/site-packages/ragas/executor.py", line 84, in _aresults
raise e
File "/lib/python3.10/site-packages/ragas/executor.py", line 79, in _aresults
r = await future
File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result()  # May raise f.exception().
File "/lib/python3.10/site-packages/ragas/executor.py", line 38, in sema_coro
return await coro
File "/lib/python3.10/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
return counter, await callable(*args, **kwargs)
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 116, in ascore
raise e
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 112, in ascore
score = await self._ascore(row=row, callbacks=group_cm, is_async=is_async)
File "/lib/python3.10/site-packages/ragas/metrics/_context_recall.py", line 147, in _ascore
result = await self.llm.generate(
File "/lib/python3.10/site-packages/ragas/llms/base.py", line 110, in generate
return await loop.run_in_executor(None, generate_text)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter
raise retry_exc.reraise()
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise
raise self.last_attempt.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/lib/python3.10/site-packages/ragas/llms/base.py", line 147, in generate_text
result = self.langchain_llm.generate_prompt(
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
raise e
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
self._generate_with_cache(
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
result = self._generate(
File "/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 489, in _generate
with telemetry.tool_context_manager(self._user_agent):
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 48, in tool_context_manager
_pop_tool_name(tool_name)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 57, in _pop_tool_name
raise RuntimeError(
RuntimeError: Tool context error detected. This can occur due to parallelization.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "src/evaluation/main.py", line 50, in main
df = evaluation(dataset)
File "src/evaluation/main.py", line 62, in evaluation
return evaluate(
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 231, in evaluate
raise e
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 213, in evaluate
raise ExceptionInRunner()
ragas.exceptions.ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass `raise_exceptions=False` incase you want to show only a warning message instead.
shawnmittal commented 4 months ago

I also have this problem. Something is very broken with the evaluate function when it comes to threading. Not sure what the problem is.

ycjcl868 commented 2 months ago

same issue

jjmachan commented 2 months ago

can you guys check out the latest version [v0.1.11] works for you? We had some simplification with the evaluation function that should fix this