[R-239] Why the later parts testset will never be accessed?

princepride commented 5 months ago

The code in testset/generator.py make me confused:

current_nodes = [
            CurrentNodes(root_node=n, nodes=[n])
            for n in self.docstore.get_random_nodes(k=test_size)
        ]
        total_evolutions = 0
        for evolution, probability in distributions.items():
            for i in range(round(probability * test_size)):
                exec.submit(
                    evolution.evolve,
                    current_nodes[i],
                    name=f"{evolution.__class__.__name__}-{i}",
                )
                total_evolutions += 1

Your Question Assuming that in the code, the current_nodes index range generated according to each distribution probability traversal will be concentrated in the front part of the current_nodes list, and the later parts will never be accessed?

Code Examples from ragas.testset.generator import TestsetGenerator from ragas.testset.evolutions import simple, reasoning, multi_context from langchain_openai import ChatOpenAI, OpenAIEmbeddings

generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k") critic_llm = ChatOpenAI(model="gpt-4") embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain( generator_llm, critic_llm, embeddings )

testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.3, multi_context: 0.2})

Using the sample code from Ragas documents, the simple testset will use the No. 1-5 random selected documents, the reasoning testset will use the No. 1-3 random selected documents, the multi_context testset will use the No. 1-2 random selected documents. The No. 6-10 documents will never be used.

_R-239

jjmachan commented 4 months ago

wow that is an embarrassing bug @princepride - thanks a lot for reporting that 😅

will fix it shortly

princepride commented 4 months ago

wow that is an embarrassing bug @princepride - thanks a lot for reporting that 😅

will fix it shortly

I try to fix this bug and merge it. https://github.com/explodinggradients/ragas/pull/880

explodinggradients / ragas

[R-239] Why the later parts testset will never be accessed? #860