Open Vtej98 opened 4 months ago
Hey @Vtej98 , can you try to use it with a limited number of documents first? Also ensuring with_debugging argument to true so that we have better context on where it is getting stuck? Also which version of ragas are you using?
Hello @shahules786 ,
The number of documents used is 1, it hardly contains 9 pages.
Filename and doc_id are the same for all nodes. Generating: 0%| | 0/5 [00:00<?, ?it/s][ragas.testset.filters.DEBUG] node filter: {'score': 7.5} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Cyberlife Browser', 'Ring alert', 'Correspondence queue', 'Confirmation letter', 'POA document'] [ragas.testset.filters.DEBUG] node filter: {'score': 7.5} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Cyberlife Browser', 'Ring alert', 'Correspondence queue', 'Confirmation letter', 'POA document'] [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 8.0} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Cyberlife Browser', 'Ring alert', 'Correspondence queue', 'Confirmation letter', 'POA document'] [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 7.5} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Cyberlife Browser', 'Ring alert', 'Correspondence queue', 'Confirmation letter', 'POA document'] [ragas.testset.evolutions.INFO] seed question generated: What is the process for sending a confirmation letter after removing an AIF/POA or guardian/conservator from the system? [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] filtered question: {'reason': 'The question is specific and refers to a particular process, making it clear and answerable.', 'verdict': '1'} [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.filters.DEBUG] node filter: {'score': 4.0} [ragas.testset.evolutions.INFO] retrying evolution: 0 times [ragas.testset.evolutions.DEBUG] answer generated: {'answer': 'Send a ring alert to the Correspondence queue per the Ring d.Alert Job Aid (OPE-007) in Related resources to have a confirmation letter sent.', 'verdict': '1'} Generating: 20%|██████████████▍ | 1/5 [00:10<00:42, 10.75s/it][ragas.testset.filters.DEBUG] node filter: {'score': 7.5} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['Cyberlife Browser', 'Ring alert', 'Correspondence queue', 'Confirmation letter', 'POA document'] [ragas.testset.evolutions.INFO] seed question generated: What should be done after removing an AIF/POA or guardian/conservator from the system in order to have a confirmation letter sent? [ragas.testset.filters.DEBUG] filtered question: {'reason': 'The question is clear and specific, referring to a particular process and desired outcome.', 'verdict': '1'} [ragas.testset.evolutions.DEBUG] [ReasoningEvolution] simple question generated: What should be done after removing an AIF/POA or guardian/conservator from the system in order to have a confirmation letter sent? [ragas.testset.filters.DEBUG] filtered question: {'reason': 'The question is clear and specific, referring to a particular action in a system and asking for the subsequent steps.', 'verdict': '1'} [ragas.testset.evolutions.DEBUG] [ReasoningEvolution] question compressed: After removing an AIF/POA or guardian/conservator from the system, what action should be taken to ensure that a confirmation letter is sent? [ragas.testset.filters.DEBUG] evolution filter: {'reason': 'Both questions ask for the same procedure to be followed after removing an AIF/POA or guardian/conservator from the system to ensure a confirmation letter is sent. They share the same depth, breadth, and requirements.', 'verdict': '1'} [ragas.testset.evolutions.DEBUG] evolution_filter failed, retrying with 1 [ragas.testset.evolutions.INFO] retrying evolution: 1 times [ragas.testset.filters.DEBUG] node filter: {'score': 7.5} [ragas.testset.evolutions.DEBUG] keyphrases in merged node: ['POA request', 'Address of record', 'Power of Attorney', 'Notarized signature', 'AIF/POA'] [ragas.testset.evolutions.INFO] seed question generated: What are the requirements for a Power of Attorney document to be considered valid? [ragas.testset.filters.DEBUG] filtered question: {'reason': 'The question is clear and specific, referring to a particular legal document and its validity requirements.', 'verdict': '1'} [ragas.testset.evolutions.DEBUG] [ReasoningEvolution] simple question generated: What are the requirements for a Power of Attorney document to be considered valid? [ragas.testset.filters.DEBUG] filtered question: {'reason': 'The question is clear and specific, outlining the exact information needed regarding a Power of Attorney document.', 'verdict': '1'} [ragas.testset.evolutions.DEBUG] [ReasoningEvolution] question compressed: What are the criteria for a Power of Attorney document to be considered valid, including the requirements for updating the address of record and the acceptable timeframe for the document's date? [ragas.testset.filters.DEBUG] evolution filter: {'reason': 'The second question includes additional specific requirements (address updates and document date timeframe) that are not present in the first question, leading to a different depth of inquiry.', 'verdict': '0'} [ragas.testset.evolutions.DEBUG] answer generated: {'answer': "XXXXX I just hidden the answer, but it has the answer correctly", 'verdict': '1'} Generating: 40%|████████████████████████████▊ | 2/5 [00:40<01:06, 22.08s/it]
It's getting freeze here, and draining the openAI tokens
Python version: 3.8 ragas==0.1.2
I also tried from python version: 3.12 Building ragas from source
It's the same thing
It's the same for the latest version of ragas as well. I also tried: ragas 0.1.3
It's just freezing, after 1/3/4 questions generation
Same thing happens when calling generate_with_llamadocs(). It gets stuck forever and insanely consumes tokens in the background. There seems to be a threading lock problem.
+1, I've got the same issue.
It gets stuck at this, doesn't generate anything (but also doesn't consume tokens).
Same here, it stucks. Looking for a solution.
UPDATE When I tried with installation from source like
git clone https://github.com/explodinggradients/ragas && cd ragas
pip install -e .
the issue seems to be solved.
There is a new issue though. I couldn't generate dataset using a single JSON file. I got "Filename and doc_id are the same for all nodes." error. But it is okay I wasn't gonna generate dataset with a single file anyway.
Hope it helps.
I'm facing the same problem too. And I found out that when I work with documents which are short and in Japanese, the generating phase goes into a continuous loop. However, When I work with documents that are long and in English it works as I expected. Moreover, When I use documents that are long and in Japanese, I see the tendency to do loops more with the same question(using context_scoring_prompt). but it doesn't go to the continuous loop.
@Kelp710 did you do automatic language adaptation before using it with Japanese documents?https://docs.ragas.io/en/stable/howtos/applications/use_prompt_adaptation.html
@shahules786 Thank you for the advice,I have not used automatic language adaptation. I tried automatic language adaptation. However, it doesn't consider the pdf document I use anymore. and generate irrelevant questions/answers.
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.testset.evolutions import multi_context, reasoning, simple,conditional
from ragas.testset.generator import TestsetGenerator
import os
import uuid
from llama_index.core import SimpleDirectoryReader
unique_id = uuid.uuid4().hex[0:8]
os.environ["LANGCHAIN_PROJECT"] = f"Tracing Walkthrough - {unique_id}"
loader = SimpleDirectoryReader("./doc")
query_space = "large language models"
documents = loader.load_data()
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)
generator.adapt("japanese", evolutions=[simple, reasoning,conditional,multi_context])
distributions = {simple:0.2,multi_context: 0.4, reasoning: 0.1,conditional:0.3}
# generate testset
testset = generator.generate_with_langchain_docs(documents, 3, distributions,with_debugging_logs=True)
testset=testset.to_pandas()
Getting the same error (with llama-index running both on notebook and as a script) - keeps on going even after setting number of docs to 1:
..
Filename and doc_id are the same for all nodes.
Generating: 0%| | 0/1 [00:00<?, ?it/s][ragas.testset.filters.DEBUG] node filter: {'score': 1.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 4.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 4.5}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 0.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 3.5}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 1.0}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
[ragas.testset.filters.DEBUG] node filter: {'score': 3.5}
[ragas.testset.evolutions.INFO] retrying evolution: 0 times
As @asti009asti mentioned, this seems to be a theading issue - @shahules786 do you have any pointers as to what might be causing this? Happy to contribute
Stuck on generating at 50%, unable to stop the python interpreter too.
Tried installing from source like @omerfguzel mentioned, and running on just a single english markdown document. Tried several times, with several different models, but all were stuck at 50%
The document: https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/python/cross_service/apigateway_covid-19_tracker
import os
from pathlib import Path
from langchain_community.document_loaders import DirectoryLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.testset.evolutions import multi_context, reasoning, simple
from ragas.testset.generator import TestsetGenerator
from get_secrets import project_secrets
os.environ["OPENAI_API_KEY"] = project_secrets['openai_token']
d = Path(__file__).parent / 'documents'
loader = DirectoryLoader(str(d))
documents = loader.load()
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=3, distributions={
simple: 0.5, reasoning: 0.25, multi_context: 0.25})
testset.to_dataset().save_to_disk(d / 'testset')
a couple notes here as it might help:
The generation process gets stuck if you don't adapt the prompt to your language. You should call adapt() method on your TestGenerator object. This solves the issue.
generator = TestsetGenerator.with_openai( generator_llm=llm, critic_llm=critic_llm, embeddings=embeddings_llm ) ... generator.adapt(language="whatever",evolutions=[simple, multi_context, conditional, reasoning]) generator.save(evolutions=[simple, multi_context, reasoning, conditional]) ... question_dataset = generator.generate_with_llamaindex_docs( run_config=run_config, documents=documents, test_size=number_of_questions, distributions=distributions, is_async=False, raise_exceptions=False, )
I am not sure if the continuous loop is due to the local environment. When I change to Google Colab, instead of running the Python file in Windows PowerShell, it works. The generation using the API is still slow, though.
Any update on this issue as I am also facing this issue with latest ragas library, it always stuck on 90% test generation ?
I am trying to generate testset by following the Guide from Ragas
And it's stuck at 14% for embedding nodes.
Any updates about this issue will be helpful.
Facing the same issue any update on this
When i am running this code in jupyter notebook i am able to create the dataset but when i am running through CLI this is in continuous loop.
Question I am not sure, what's happening. I see that the testset data isn't generating and just going into a continuous loop, exhausting the tokens of openAI
My Code from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain.document_loaders import DirectoryLoader import os
OPENAI_API_KEY = "sk-xxxxxx" os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
def load_docs(directory): loader = DirectoryLoader(directory) documents = loader.load() return documents
documents = load_docs("./source") for document in documents: document.metadata['file_name'] = document.metadata['source']
generator = TestsetGenerator.with_openai()
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) testset.to_pandas() testset.to_pandas().to_excel('output_data.xlsx', index=False)
Additional context I did explore the code, and found that it has retry mechanisms, of 15 retries and a wait time of 90 seconds. But I still waited for a long time no response of completion.