Closed Sam-364 closed 1 month ago
Hello, @Sam-364! I'm here to assist you with resolving bugs, answering questions, and becoming a contributor. Let's work together to tackle this issue!
There are a few similar issues that might help you resolve this problem:
Generate test data for 1 pdf (open):
New executor throws RuntimeError: ... got Future <..> attached to a different loop
(closed):
nest_asyncio
to handle event loops in Jupyter notebooks. You can try adding the following code to your script:
import nest_asyncio
nest_asyncio.apply()
The ExceptionInRunner
error occurs because the generate_with_langchain_docs
function calls the generate
method with the raise_exceptions
parameter set to True
. This means any exception during the generation process will be raised [1].
Here is a modified version of your code with the nest_asyncio
fix:
import torch
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_ollama import ChatOllama
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document as LCDocument
import pymupdf
import nest_asyncio
nest_asyncio.apply()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": device}
generator_llm = ChatOllama(model="llama3", temperature=0.2)
critic_llm = ChatOllama(model="llama3.1", temperature=0.2)
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)
def load_pdf_as_document(file_path):
doc = pymupdf.open(file_path)
text = ""
for page in doc:
text += page.get_text()
return LCDocument(page_content=text)
pdf_path = "/content/1.pdf"
document = load_pdf_as_document(pdf_path)
documents = [document]
print(documents)
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings,
)
testset = generator.generate_with_langchain_docs(
documents=documents,
test_size=10,
distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
with_debugging_logs=True,
is_async=True,
raise_exceptions=False,
)
testset.to_pandas()
test_df = testset.to_pandas()
test_df.head()
Try running this updated script and see if it resolves the issue. If the problem persists, consider checking the detailed error trace for any specific issues related to your environment or dependencies.
To continue talking to Dosu, mention @dosu.
I had previously used nest_asyncio.apply() but later realized the "is_async=True" parameter in testset is also performing the same thing wihout any recurrent effect. P.S. - Using the suggested changes will also not work because the testset module is already in async state.
after downgrading the version about langchain series packages,the same issue was solved.now my packages version are: langchain 0.2.16 langchain-community 0.2.0 langchain-core 0.2.41 langchain-openai 0.1.20 langchain-text-splitters 0.2.4
Yes I forgot to comment, I did the same with downgrading the versionS of the packages and it worked for me but now the only issue is - it is taking an eternity to generate the dataset. P.S. - Hope, to have a discussion on that issue later on !
For now, I'm closing the issue.
I have checked both the documentation and ragas-langchain documentation and couldn't resolve my issue.
The bug Whenever I try to execute the testset using the generator module having the generate_with_langchain_docs function, I'm getting a thread handling error and the execution abruptly stops. I tried downgrading to ragas==0.1.7 which was the last version where no such errors were there but still it didn't work. Even using generate_with_llamaindex_docs, the same error is encountered so I tried to get the best of the two frameworks by combining the features of both the frameworks(i.e. using the document_loader of langchain and using the generate_with_llamaindex_docs of llamaindex) but the issue persisted. I have followed the documentation thoroughly but the bug couldn't be fixed. Passing the prescribed "raise_exceptions=False" also doesn't have any effect. I have used Ollama based local Llama versions as generator_llm and critic_llm. I had checked on individual arguments but it is not working.
Ragas version: 0.1.20 Python version: 3.10.12
Here is my detailed code:
Error trace While executing the following code block:
I am getting the following error:
Expected behavior The script was intended to create a synthetic QnA dataset based on the docs I used for the evaluation of a RAG pipeline but the execution is halted abruptly due to the bug. All the modules except of the testset module are working.
Hoping for quick fix to the issue because I saw many such issues like this in the generator module section.