[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80% And I've been consuming tokens from the OpenAI API.
By the way, my embedded documents are in Chinese. Would this potentially affect this case?
I am a student who has just started programming, and my understanding of related knowledge is limited. I would be very grateful if there were experts who could understand my incomplete questions and help me.
Filename and doc_id are the same for all nodes.
Generating: 70%|██████████████████████████████████████████████████████▌ | 7/10 [01:29<00:43, 14.33s/it]
Code Examples
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
os.environ["OPENAI_API_KEY"] = "sk-xxx"
with open("RAGAS\output.md", encoding='utf-8') as f:
state_of_the_union = f.read()
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=512,
chunk_overlap=128,
length_function=len,
is_separator_regex=False,
separators=[
"###"
]
)
documents = text_splitter.create_documents([state_of_the_union])
print(documents[0])
generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional
generator.adapt(language="chinese",evolutions=[simple, multi_context, conditional, reasoning])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, is_async=False)
testset.to_pandas()
testset.to_pandas().to_csv("RAGAS\output.csv", index=False)
Additional context
Anything else you want to share with us?
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80% And I've been consuming tokens from the OpenAI API.
By the way, my embedded documents are in Chinese. Would this potentially affect this case?
I am a student who has just started programming, and my understanding of related knowledge is limited. I would be very grateful if there were experts who could understand my incomplete questions and help me.
Code Examples
Additional context Anything else you want to share with us?