Open franck-cussac opened 6 months ago
I encountered the same error and figured out that it occurs solely due to the "answer_relevancy" metric. I have no issues with any other metrics.
I test without and I still have an error :
Creating json from Arrow format: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 58.06ba/s]
Evaluating: 9%|███████████████▉ | 202/2232 [01:45<10:10, 3.33it/s]Failed to parse output. Returning None.
Evaluating: 9%|████████████████▋ | 211/2232 [01:49<20:42, 1.63it/s]Failed to parse output. Returning None.
Evaluating: 11%|███████████████████▏ | 243/2232 [01:59<11:26, 2.90it/s]Failed to parse output. Returning None.
Evaluating: 12%|█████████████████████▌ | 273/2232 [02:10<12:32, 2.60it/s]Failed to parse output. Returning None.
Evaluating: 13%|██████████████████████▏ | 282/2232 [02:12<05:35, 5.82it/s]Failed to parse output. Returning None.
Evaluating: 59%|███████████████████████████████████████████████████████████████████████████████████████████████████████ | 1314/2232 [09:25<06:35, 2.32it/s]
Exception in thread Thread-376:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/lib/python3.10/site-packages/ragas/executor.py", line 96, in run
results = self.loop.run_until_complete(self._aresults())
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/lib/python3.10/site-packages/ragas/executor.py", line 84, in _aresults
raise e
File "/lib/python3.10/site-packages/ragas/executor.py", line 79, in _aresults
r = await future
File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one
return f.result() # May raise f.exception().
File "/lib/python3.10/site-packages/ragas/executor.py", line 38, in sema_coro
return await coro
File "/lib/python3.10/site-packages/ragas/executor.py", line 112, in wrapped_callable_async
return counter, await callable(*args, **kwargs)
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 116, in ascore
raise e
File "/lib/python3.10/site-packages/ragas/metrics/base.py", line 112, in ascore
score = await self._ascore(row=row, callbacks=group_cm, is_async=is_async)
File "/lib/python3.10/site-packages/ragas/metrics/_context_recall.py", line 147, in _ascore
result = await self.llm.generate(
File "/lib/python3.10/site-packages/ragas/llms/base.py", line 110, in generate
return await loop.run_in_executor(None, generate_text)
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 325, in iter
raise retry_exc.reraise()
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 158, in reraise
raise self.last_attempt.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/lib/python3.10/site-packages/ragas/llms/base.py", line 147, in generate_text
result = self.langchain_llm.generate_prompt(
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 560, in generate_prompt
return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 421, in generate
raise e
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 411, in generate
self._generate_with_cache(
File "/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 632, in _generate_with_cache
result = self._generate(
File "/lib/python3.10/site-packages/langchain_google_vertexai/chat_models.py", line 489, in _generate
with telemetry.tool_context_manager(self._user_agent):
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 48, in tool_context_manager
_pop_tool_name(tool_name)
File "/lib/python3.10/site-packages/google/cloud/aiplatform/telemetry.py", line 57, in _pop_tool_name
raise RuntimeError(
RuntimeError: Tool context error detected. This can occur due to parallelization.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "src/evaluation/main.py", line 50, in main
df = evaluation(dataset)
File "src/evaluation/main.py", line 62, in evaluation
return evaluate(
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 231, in evaluate
raise e
File "/lib/python3.10/site-packages/ragas/evaluation.py", line 213, in evaluate
raise ExceptionInRunner()
ragas.exceptions.ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass `raise_exceptions=False` incase you want to show only a warning message instead.
I also have this problem. Something is very broken with the evaluate function when it comes to threading. Not sure what the problem is.
same issue
can you guys check out the latest version [v0.1.11] works for you? We had some simplification with the evaluation function that should fix this
Describe the bug I run evaluation, sometimes it works sometimes it fails with :
RuntimeError: Tool context error detected. This can occur due to parallelization
Ragas version: 0.1.7 Python version: 3.10.12
Code to Reproduce
Error trace
Expected behavior An evaluation working 100% of time
Additional context I'm using vertex AI and I follow the given notebook in example.