explodinggradients / ragas

Supercharge Your LLM Application Evaluations πŸš€
https://docs.ragas.io
Apache License 2.0
7.33k stars 746 forks source link

Testset generation broken after migrating from 0.1.x to 0.2.4 #1660

Open malikbrh opened 1 week ago

malikbrh commented 1 week ago

[X] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I have a DivisionByZero error while generating my testset. The same code structure was working fine in v0.1, after migrating to 0.2 it broke. I have tried with multiple models, and finally stuck with OpenAI GPT4o-mini and text-embedding-3-small.

I added two documents to generate the testset, but it always fails at the same place. When exploring the KnowledgeGraph in the Debugger, it is fine with multiple Nodes generated by the previous steps.

Ragas version: v0.2.4 Python version: v3.11.10

Code to Reproduce Note: documents are llama_index documents ` kg = KnowledgeGraph()

for doc in documents:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={"page_content": doc.text, "document_metadata": doc.metadata}
        )
    )

generator = TestsetGenerator(llm=LlamaIndexLLMWrapper(generator_llm),
                             embedding_model=LlamaIndexEmbeddingsWrapper(embeddings),
                             knowledge_graph=kg)
testset = generator.generate_with_llamaindex_docs(
    documents,
    testset_size=8,
    with_debugging_logs=True
)
print("Testset GENERATED")`

Error trace

Generating personas: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:01<00:00,  2.74it/s]
Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 202, in <module>
    asyncio.run(evaluation_test())
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 150, in evaluation_test
    testset = await generate_testset_from_documents(generator, documents)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/eval.py", line 91, in generate_testset_from_documents
    testset = generator.generate_with_llamaindex_docs(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 264, in generate_with_llamaindex_docs
    return self.generate(
           ^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 410, in generate
    raise e
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/generate.py", line 407, in generate
    scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
                                                         ^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 200, in results
    results = asyncio.run(self._process_jobs())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 140, in _process_jobs
    result = await future
             ^^^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 615, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/Users/malik/.pyenv/versions/3.11.10/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 96, in wrapped_callable_async
    raise e
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/malik/Documents/ProjetSemestre-5/rag-chatbot-mk/code/evaluation/.venv/lib/python3.11/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py", line 73, in _generate_scenarios
    num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
                                         ~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

Expected behavior A clear and concise description of what you expected to happen.

I would expect the testset to be generated properly, or at least a more self-explainable error. I do not really know what would be the next best steps to debug.

Additional context Add any other context about the problem here. I can provide more stuff if needed, just ask me in comments and I'll see what I can post, as my project is supposed to stay confidential. Thanks in advance for your help !

malikbrh commented 1 week ago

After a bit more investigation, I've ran my code line by line and was able to do the following observations:

The first line of code I see not having an expected value is in the method flagged by my error trace: ragas.testset.synthesizers.multi_hop.abstract.MultiHopAbstractQuerySynthesizer._generate_scenarios where my node_clusters list (first line of this method) is empty.

Below is a picture of one of my Relationship object in my Knowledge_graph, some of them have properties overlapped_items and entities_overlap_score, but none has summary_similarity which would explain why the check True if rel.get_property("summary_similarity") else False never gets any node_clusters.

Screenshot 2024-11-12 at 17 16 42

So my question becomes the following: Why would my Relationships not get any summary_similarity property set? Am I missing documents for my testset to be generated? The documentation is quite light at the moment, any help would be greatly appreciated !

shahules786 commented 1 week ago

Hey, the reason could be that the default summary similarity threshold might be higher for your docs. You may do either or both of the following. 1) modify and add your own transforms 2) Skip MultiHopAbstractQuerySynthesizer query type by removing it from query_distribution parameter. We know that docs for Testgen is lagging, and we are trying our best to improve.

malikbrh commented 1 week ago

Hey @shahules786 , First of all, thanks for your reply ! I'm still debugging my problems with testset generation and I finally ended up with the same conclusion than you, similarity threshold was too high for it to pass on my first documents so I changed them to new ones that passed this check. I am relatively new with RAG architecture, my bad.

But I still have a question, for testing purposes, I changed the default_filter in ragas/testset/persona.py

By changing the return random.random() < 0.25 to True, my testset finally got generated, my nodes were removed from the generate_personas_from_kg method by this filter.

def default_filter(node: Node) -> bool:
    if (
        node.type.name == "DOCUMENT"
        and node.properties.get("summary_embedding") is not None
    ):
        return random.random() < 0.25
    else:
        return False

Any reason why this default_filter is coded this way? I understood the if conditions but couldn't find any explanation for the random. Thanks in advance !

shahules786 commented 1 week ago

Hey @malikbrh Great, amazed that you could debug it without much help from docs. Would love any contributions from you to improve ragas. To answer your question, the idea was to sample random summaries from given document set, cluster them and use one summary from each cluster (representative of the cluster) to estimate the persona that could interact with it. This feature is very new, and I'm sure it can be further improved. Feel free to share any thought ( consider joining our discord) Just added some docs for it: https://docs.ragas.io/en/latest/howtos/customizations/testgenerator/_persona_generator/?h=pe#personas-in-testset-generation