explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
5.68k stars 529 forks source link

Ragas Synthetic Test Data Generation error using AzureOpenaiEmbeddings #866

Open sona-16 opened 2 months ago

sona-16 commented 2 months ago

I was trying to do "test generation" using RAGAS framework with the help of the "https://docs.ragas.io/en/stable/concepts/testset_generation.html", I'm facing error.

Please have a look on the below error.

ragas.exceptions.ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead. Task was destroyed but it is pending! task: <Task pending name='Task-3' coro=<as_completed..sema_coro() running at C:\Users\sonaganesh.g\Desktop\RAGAS syhthetic data generation\syn\Lib\sit e-packages\ragas\executor.py:38> wait_for= cb=[as_completed.._on_completion() at C:\Users\sonaganesh.g\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py:618]> Task was destroyed but it is pending! task: <Task pending name='Task-5' coro=<as_completed..sema_coro() running at C:\Users\sonaganesh.g\Desktop\RAGAS syhthetic data generation\syn\Lib\sit e-packages\ragas\executor.py:38> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[as_completed.._on_completion() at C:\Users\sonaganesh.g\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py:618]> Task was destroyed but it is pending! task: <Task pending name='Task-2' coro=<as_completed..sema_coro() running at C:\Users\sonaganesh.g\Desktop\RAGAS syhthetic data generation\syn\Lib\site-packages\ragas\executor.py:38> wait_for= cb=[as_completed.._on_completion() at C:\Users\sonaganesh.g\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py:618]>

And my code is:

loader = PyPDFLoader(" .pdf") documents = loader.load()

used same LLM for both generator_lmm and critic_llm

generator_llm = llm() critic_llm = llm() embeddings =embeddings() # used AzureOpenaiEmbeddings

generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings ) distributions = { simple: 0.5, multi_context: 0.4, reasoning: 0.1 }

testset = generator.generate_with_langchain_docs(documents, 10, distributions) testset.to_pandas()

versions: openai - 1.17.0 ragas - 0.1.4/ 0.1.6

jjmachan commented 2 months ago

I don't think this is a Ragas issue - are you still facing this @sona-16 ?

sona-16 commented 2 months ago

Hi jjmachan,

I haven't tried yet on the issue. Will do it by today and update you

Satyamlilly commented 2 months ago

I am facing this same issue.

Filename and doc_id are the same for all nodes.
Generating: 52%|█████▎ | 21/40 [02:41<02:26, 7.69s/it] Exception in thread Thread-21: Traceback (most recent call last): File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/threading.py", line 973, in _bootstrap_inner self.run() File "/Users/L037301/Documents/GitHub/ragas/src/ragas/executor.py", line 96, in run results = self.loop.run_until_complete(self._aresults()) File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete return future.result() File "/Users/L037301/Documents/GitHub/ragas/src/ragas/executor.py", line 84, in _aresults raise e File "/Users/L037301/Documents/GitHub/ragas/src/ragas/executor.py", line 79, in _aresults r = await future File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/tasks.py", line 614, in _wait_for_one return f.result() # May raise f.exception(). File "/Users/L037301/Documents/GitHub/ragas/src/ragas/executor.py", line 38, in sema_coro return await coro File "/Users/L037301/Documents/GitHub/ragas/src/ragas/executor.py", line 112, in wrapped_callable_async return counter, await callable(*args, **kwargs) File "/Users/L037301/Documents/GitHub/ragas/src/ragas/testset/evolutions.py", line 144, in evolve return await self.generate_datarow( File "/Users/L037301/Documents/GitHub/ragas/src/ragas/testset/evolutions.py", line 210, in generate_datarow selected_nodes = [ File "/Users/L037301/Documents/GitHub/ragas/src/ragas/testset/evolutions.py", line 213, in if i - 1 < len(current_nodes.nodes) TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

ExceptionInRunner Traceback (most recent call last) File /Users/L037301/Documents/GitHub/ragas/src/ragas/tryout.py:1 ----> 1 testset4 = generator.generate_with_langchain_docs(pages4, test_size=40, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

File ~/Documents/GitHub/ragas/src/ragas/testset/generator.py:179, in TestsetGenerator.generate_with_langchain_docs(self, documents, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config) 174 # chunk documents and add to docstore 175 self.docstore.add_documents( 176 [Document.from_langchain_document(doc) for doc in documents] 177 ) --> 179 return self.generate( 180 test_size=test_size, 181 distributions=distributions, 182 with_debugging_logs=with_debugging_logs, 183 is_async=is_async, 184 raise_exceptions=raise_exceptions, 185 run_config=run_config, 186 )

File ~/Documents/GitHub/ragas/src/ragas/testset/generator.py:274, in TestsetGenerator.generate(self, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config) 272 test_data_rows = exec.results() 273 if not test_data_rows: --> 274 raise ExceptionInRunner() 276 except ValueError as e: 277 raise e

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

Sarrion commented 1 month ago

Any update? I'm facing the same message for a ragas.evaluate inside a ThreadPoolExecutor

damlitos commented 1 month ago

I also have the same issue, everything worked fine for a set of 300 pdfs, and now all of a sudden the same code gives the error below:

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.

sona-16 commented 1 month ago

Hi Team,

I'm still facing the same error. I doubt, is this due to huggingface LLM I'm using or due to computation power. And also, Im using google colab notebook with CPU setup to do this task.

Hi @damlitos could you please share me the code snippet, why because I have just used 2 pages of PDF, even though no proper output. As you told you may be getting an satisfactory answer for lesser # of pages.

error:

embedding nodes:   0%  0/4 [02:27<?, ?it/s] Exception in thread Thread-13: Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 96, in run results = self.loop.run_until_complete(self._aresults()) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 84, in _aresults raise e File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 79, in _aresults r = await future File "/usr/lib/python3.10/asyncio/tasks.py", line 571, in _wait_for_one return f.result() # May raise f.exception(). File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 38, in sema_coro return await coro File "/usr/local/lib/python3.10/dist-packages/ragas/executor.py", line 112, in wrapped_callable_async return counter, await callable(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/ragas/embeddings/base.py", line 23, in embed_text embs = await self.embed_texts([text], is_async=is_async) File "/usr/local/lib/python3.10/dist-packages/ragas/embeddings/base.py", line 33, in embed_texts return await aembed_documents_with_retry(texts) File "/usr/local/lib/python3.10/dist-packages/tenacity/_asyncio.py", line 142, in async_wrapped return await fn(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/tenacity/_asyncio.py", line 58, in call do = await self.iter(retry_state=retry_state) File "/usr/local/lib/python3.10/dist-packages/tenacity/_asyncio.py", line 110, in iter result = await action(retry_state) File "/usr/local/lib/python3.10/dist-packages/tenacity/_asyncio.py", line 78, in inner return fn(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 410, in exc_check raise retry_exc.reraise() File "/usr/local/lib/python3.10/dist-packages/tenacity/init.py", line 183, in reraise raise self.last_attempt.result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.get_result() File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/usr/local/lib/python3.10/dist-packages/tenacity/_asyncio.py", line 61, in call result = await fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/ragas/embeddings/base.py", line 64, in aembed_documents return await self.embeddings.aembed_documents(texts) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1709, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'SentenceTransformer' object has no attribute 'aembed_documents'

ExceptionInRunner Traceback (most recent call last) in <cell line: 1>() ----> 1 testset = generator.generate_with_langchain_docs(pages, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

2 frames /usr/local/lib/python3.10/dist-packages/ragas/testset/docstore.py in add_nodes(self, nodes, show_progress) 252 results = executor.results() 253 if not results: --> 254 raise ExceptionInRunner() 255 256 for i, n in enumerate(nodes):

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exceptions=False incase you want to show only a warning message instead.