explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.54k stars 641 forks source link

TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80%. #1003

Open pp6699 opened 3 months ago

pp6699 commented 3 months ago

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question TestsetGenerator.from_langchain Generating failed, randomly stuck at 0% to 80% And I've been consuming tokens from the OpenAI API.

By the way, my embedded documents are in Chinese. Would this potentially affect this case?

I am a student who has just started programming, and my understanding of related knowledge is limited. I would be very grateful if there were experts who could understand my incomplete questions and help me.

Filename and doc_id are the same for all nodes.
Generating:  70%|██████████████████████████████████████████████████████▌                       | 7/10 [01:29<00:43, 14.33s/it]

Code Examples

from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "sk-xxx"

with open("RAGAS\output.md", encoding='utf-8') as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=512,
    chunk_overlap=128,
    length_function=len,
    is_separator_regex=False,
    separators=[
    "###"
    ]
)

documents = text_splitter.create_documents([state_of_the_union])
print(documents[0])

generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional

generator.adapt(language="chinese",evolutions=[simple, multi_context, conditional, reasoning])
generator.save(evolutions=[simple, reasoning, multi_context,conditional])

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},  is_async=False)

testset.to_pandas()

testset.to_pandas().to_csv("RAGAS\output.csv", index=False)

Additional context Anything else you want to share with us?

huangxuyh commented 1 month ago

卡住了,估计就是json的输出问题了

hzishan commented 3 days ago

我看好像是編碼不支持 "~\miniconda3\envs\my_env\Lib\site-packages\ragas\llms\prompt.py" line 286, encoding="utf-8"