explodinggradients / ragas

Supercharge Your LLM Application Evaluations šŸš€
https://docs.ragas.io
Apache License 2.0
7.3k stars 746 forks source link

`testset_size` Parameter Not Generating Correct Number of Samples #1670

Closed Jayashree-kalabhavi closed 1 week ago

Jayashree-kalabhavi commented 1 week ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I specified testset_size=5 in my code when generating a test set using the TestsetGenerator, but the resulting dataset contained 7 samples instead of the expected 5.

Code Used:

Ragas version: Python version:

Code to Reproduce


from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
docs = loader.load()
dataset = generator.generate_with_langchain_docs(docs, testset_size=5)

**Error trace**

**Expected behavior**
The dataset should contain exactly 5 samples.

**Additional context**
Additional Information:

# Initialize the LLM wrapper
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))

# Initialize the Embeddings wrapper
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

Document loader: langchain_community.document_loaders import WebBaseLoader
Number of documents loaded: [10]

<!-- PS: bugs suck but is also part of the process. We sincerely apologies for breaking your flow because of it, but don't worry, we got your back ā¤ļø. We will get this fixed as fast as we can and thanks for helping us out by reporting it šŸ™. -->
jjmachan commented 1 week ago

hey @Jayashree-kalabhavi that is the expected behaviour šŸ™‚

the ideas was to round up for the distributions so that after you have more testcases to review

I'm closing this now

Jayashree-kalabhavi commented 1 week ago

Thanks @jjmachan Is it expected behavior for the same questions to appear in the test set? Additionally, The generated user inputs contain spelling mistakes. Is there something I should do to fix this?

Attaching the user input for reference.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

user_input -- What does the term 'The Site of the University of Missouri' refer to according to the Board Bylaws? What is the role of the Presidnet in the University of Missouri as per the Board of Curators? Who can call a spesial meting of the Board? What are the responsibilities of the Board of Curators as per the Board Bylaws? who appoints general counsel? What are the responsibilities of the Board of Curators at the University of Missouri? What are the responsibilities of the Board of Curators at the University of Missouri?