Closed os1ma closed 2 months ago
Hello, @os1ma! I'm a bot designed to help you solve bugs, answer questions, and become a contributor. I'm here to assist you while waiting for a human maintainer.
I found similar issues related to the ExceptionInRunner
error when using LangChain v0.3 with Ragas. Here are the relevant discussions and solutions:
Open Issue: The runner thread which was running the jobs raised an exception
raise_exceptions=False
to show only a warning message instead of raising an exception. This can help you debug the issue by providing a warning message without stopping the execution.Closed Issue: ExceptionInRunner: The runner thread which was running the jobs raised an exception
nest-asyncio
before your Ragas calls:
pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()
To resolve the ExceptionInRunner
error in TestsetGenerator
when using LangChain v0.3, you need to update the imports to use Pydantic v2 directly. Here are the steps:
Update Imports in docstore.py
:
from pydantic import Field
Update TestsetGenerator
Class:
Ensure that all Pydantic models and their usage are updated to be compatible with Pydantic v2. Here is an example:
from pydantic import BaseModel
class TestsetGenerator(BaseModel):
generator_llm: BaseRagasLLM
critic_llm: BaseRagasLLM
embeddings: BaseRagasEmbeddings
docstore: DocumentStore
@classmethod
def from_langchain(
cls,
generator_llm: LangchainLLM,
critic_llm: LangchainLLM,
embeddings: LangchainEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
chunk_size: int = 1024,
) -> "TestsetGenerator":
generator_llm_model = LangchainLLMWrapper(generator_llm)
critic_llm_model = LangchainLLMWrapper(critic_llm)
embeddings_model = LangchainEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
@classmethod
def from_llama_index(
cls,
generator_llm: LlamaindexLLM,
critic_llm: LlamaindexLLM,
embeddings: LlamaIndexEmbeddings,
docstore: t.Optional[DocumentStore] = None,
run_config: RunConfig = RunConfig(),
) -> "TestsetGenerator":
generator_llm_model = LlamaIndexLLMWrapper(generator_llm)
critic_llm_model = LlamaIndexLLMWrapper(critic_llm)
embeddings_model = LlamaIndexEmbeddingsWrapper(embeddings)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm_model)
if docstore is None:
from langchain.text_splitter import TokenTextSplitter
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=0)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embeddings_model,
extractor=keyphrase_extractor,
run_config=run_config,
)
return cls(
generator_llm=generator_llm_model,
critic_llm=critic_llm_model,
embeddings=embeddings_model,
docstore=docstore,
)
Update Other Imports:
Ensure that other parts of the codebase also import from Pydantic directly. For example, in prompts.py
and _answer_relevance.py
:
from pydantic import BaseModel
By making these updates, you should be able to resolve the ExceptionInRunner
error and ensure compatibility with LangChain v0.3 and Pydantic v2 [1][2][3][4].
To continue talking to Dosu, mention @dosu.
I'm having the same issue as well!
@frieda-huang Which filetype are you trying for test data generation. Is it pdf ?
Hey, We will take a look at this soon. The CI are also failing for the same reason. For now please keep using langchain <0.3 with ragas. We are also preparing to release a new and improved version of test data generation with v0.2 release (in 2 weeks) https://github.com/explodinggradients/ragas/pull/1321
thanks a lot for bringing this up 🙌🏽. the current strategy for the fix is as follows
The new langchain v0.3 will break the current usage of metrics. the plan for actions is as follows
- for ragas<0.2 we will pin langchain_core to <0.3
- for ragas>0.2 we will directly depend on pydantic>=2
cut a new release with the dependency as well
Yes.
On Wed, Sep 18, 2024 at 12:33 AM Akshay Wanje @.***> wrote:
@frieda-huang https://github.com/frieda-huang Which filetype are you trying for test data generation. Is it pdf ?
— Reply to this email directly, view it on GitHub https://github.com/explodinggradients/ragas/issues/1328#issuecomment-2357474632, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5VHN6E6IT7HPIJ3NEF2C7DZXD7B5AVCNFSM6AAAAABOMQOOB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJXGQ3TINRTGI . You are receiving this because you were mentioned.Message ID: @.***>
@frieda-huang I am also stuck at the same thing since 2 days. I don't know if ragas support pdf file ingestion for test data generation. convert that pdf to txt file and try it is working.
@frieda-huang I am also stuck at the same thing since 2 days. I don't know if ragas support pdf file ingestion for test data generation. convert that pdf to txt file and try it is working.
Test data generation on PDFs works for me, but it's super slow. I'm generating test data from 3 papers based off CVPR 2019 Papers. I'm using open-sourced models, so the output is not ideal but somewhat useful.
My code looks something like this:
from langchain_community.document_loaders import DirectoryLoader
from langchain_ollama import OllamaLLM
from ragas.testset.evolutions import multi_context, reasoning, simple
from ragas.testset.generator import TestsetGenerator
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
loader = DirectoryLoader(DATA_DIR)
docs = loader.load()
llm_model = "llama3.1:8b-instruct-fp16"
llm = OllamaLLM(model=llm_model)
embed_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings = HuggingFaceInferenceAPIEmbeddings(
api_key=HF_API_KEY,
model_name=embed_model,
)
generator = TestsetGenerator.from_langchain(
generator_llm=llm, critic_llm=llm, embeddings=embeddings
)
testset = generator.generate_with_langchain_docs(
docs,
test_size=20,
distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},
)
df = testset.to_pandas()
df.to_csv("testset_output.csv", index=False)
Output looks like follows:
learning rule defined as a function of the perceptual prediction error defined in Section 3.2 and is defined as\n\nλlearn = \uf8f1 \uf8f4\uf8f2\n\n− t λinit, EP (t) > µe t λinit, EP (t) < µe otherwise\n\n∆ ∆+ λinit,\n\n\uf8f4\uf8f3\n\n− t , ∆+\n\nt and λinit refer to the scaling of the learning where ∆ rate in the negative direction, positive direction and the ini- t2 tial learning rate respectively and µe = 1 t1 EP dEP . The learning rate is adjusted based on the quality of the predictions characterized by the perceptual prediction er- ror between a temporal sequence between times t1 and t2, typically defined by the gating signal.. The impact of the adaptive changes to the learning rate is shown in the quan- titative evaluation Section 4.4, where the adaptive learn- ing scheme shows improvement of up to 20% compared to training without the learning scheme.\n\nt2−t1\n\nR\n\n3.5.
For context, I also downgraded versions of langchain and langchain-ollama
langchain = "0.2.11"
langchain-ollama = "0.1.3"
langchain-huggingface = "0.0.3"
@frieda-huang Did you face connection timeout issue while running the models ConnectError: All connection attempts failed
The above exception was the direct cause of the following exception:
ConnectError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py in map_httpcore_exceptions() 87 88 message = str(exc) ---> 89 raise mapped_exc(message) from exc 90 91
ConnectError: All connection attempts failed
@frieda-huang Did you face connection timeout issue while running the models ConnectError: All connection attempts failed
The above exception was the direct cause of the following exception:
ConnectError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py in map_httpcore_exceptions() 87 88 message = str(exc) ---> 89 raise mapped_exc(message) from exc 90 91
ConnectError: All connection attempts failed
I don't have that issue, but it's behaving very weirdly. I'm now generating 50 tests based on 10 papers. I let my laptop (Apple M2) run throughout the night, it's still 32% in generation. It would sometimes tell me generation fails and return None and then progress more in generation. Now, I just hope nothing in my code lead to crash. I'm also caching the embeddings but it doesn't seem to improve the performance.
@frieda-huang Did you face connection timeout issue while running the models ConnectError: All connection attempts failed
The above exception was the direct cause of the following exception:
ConnectError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py in map_httpcore_exceptions() 87 88 message = str(exc) ---> 89 raise mapped_exc(message) from exc 90 91
ConnectError: All connection attempts failed
New Update:
I got the following error after 10 hours of generating :(
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 143, in evolve
) = await self._aevolve(current_tries, current_nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 554, in _aevolve
result = await self._acomplex_evolution(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 411, in _acomplex_evolution
return await self.aretry_evolve(current_tries, current_nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 121, in aretry_evolve
return await self._aevolve(current_tries, current_nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 554, in _aevolve
result = await self._acomplex_evolution(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 382, in _acomplex_evolution
simple_question, current_nodes, _ = await self.se._aevolve(
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/evolutions.py", line 298, in _aevolve
passed = await self.node_filter.filter(merged_node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/friedahuang/Documents/csye7230/.venv/lib/python3.12/site-packages/ragas/testset/filters.py", line 60, in filter
output["score"] = sum(output.values()) / len(output.values())
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero
sys:1: RuntimeWarning: coroutine 'Executor.wrap_callable_with_index.<locals>.wrapped_callable_async' was never awaited
hey @frieda-huang @wanjeakshay we are really sorry about this but looks like something broke in the testset generation part.
we have a new version of this coming soon #1321 and I would suggest you wait for that sadly 🙁
hey @frieda-huang @wanjeakshay we are really sorry about this but looks like something broke in the testset generation part.
we have a new version of this coming soon #1321 and I would suggest you wait for that sadly 🙁
Thank you, @jjmachan! That would be very much appreciated! Do we know when the new version will be released?
we are aiming for this month end 🤞🏽
Describe the bug
LangChain recently released v0.3.When using LangChain v0.3, TestsetGenerator raises an ExceptionInRunner.
From v0.3, LangChain internally use pydantic v2. On the other hand, ragas internally uses langchain_core.pydantic_v1. This might be the cause of the error.
LangChain v0.3 migration guide is here. https://python.langchain.com/docs/versions/v0_3/
Using LangChain v0.3 also introduces numerous potential errors, and it's necessary to update ragas to be compatible with LangChain v0.3.
Versions
I encountered this error on Google Colab.
ragas 0.1.18 (latest) also raise the same Error.
Code to Reproduce Share code to reproduce the issue
Error trace
Expected behavior
Generating testset success without any Error.
Additional context
Based on my investigation, I found that at least the implementation of default values for the Document class and Node class in ragas/testset/docstore.py is causing incorrect behavior.
https://github.com/explodinggradients/ragas/blob/c40891bf168de3124c845b75af31ed557eb79709/src/ragas/testset/docstore.py#L31-L33
https://github.com/explodinggradients/ragas/blob/c40891bf168de3124c845b75af31ed557eb79709/src/ragas/testset/docstore.py#L82-L86
For example, the embedding of Document is supposed to have None as its default value. However, for some reason, an instance of a Field object is being set instead. As a result, in the following section, the condition "n.embedding is None" is evaluated as False, leading to incorrect behavior.
https://github.com/explodinggradients/ragas/blob/c40891bf168de3124c845b75af31ed557eb79709/src/ragas/testset/docstore.py#L233
It seems that using langchain_core.pydantic_v1 is also causing issues with the values of Fields in classes that inherit from BaseModel.
Same error issue
The same error is described in the following issue.
https://github.com/explodinggradients/ragas/issues/1319