Open Jayashree-kalabhavi opened 1 day ago
Hey @Jayashree-kalabhavi 1) duplicate questions: we will take a look at this. This is related to one the issues on the roadmap 2) spelling mistakes: they are induced because of the query style = misspelled queries. As of now it's not fully configured by user. But we will add that too in the roadmap.
Thanks for reporting.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question When using the TestsetGenerator from the ragas.testset module, I am encountering the following issues:
Duplicate questions: The generated test set often contains repeated questions. Spelling mistakes: The generated questions contain spelling errors (e.g., "Presidnet" instead of "President", "spesial meting" instead of "special meeting").
Code Examples
Initialize the LLM wrapper
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))
Initialize the Embeddings wrapper
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings) dataset = generator.generate_with_langchain_docs(docs, testset_size=5)
Additional context
Sample Output: What does the term 'The Site of the University of Missouri' refer to according to the Board Bylaws? What is the role of the Presidnet in the University of Missouri as per the Board of Curators? Who can call a spesial meting of the Board? What are the responsibilities of the Board of Curators as per the Board Bylaws? What are the responsibilities of the Board of Curators at the University of Missouri? What are the responsibilities of the Board of Curators at the University of Missouri?