explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀
https://docs.ragas.io
Apache License 2.0
7.3k stars 746 forks source link

MultiHopAbstractQuerySynthesizer testset generation is not working. #1696

Open kh-taher opened 1 day ago

kh-taher commented 1 day ago

[yes] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

when setting query distribution to MultiHopAbstractQuerySynthesizer, test generation fails.

Ragas version: 0.2.6 Python version: 3.9

Code to Reproduce

from ragas.testset.synthesizers import default_query_distribution
from ragas.testset import TestsetGenerator

query_distribution = default_query_distribution(llm=generator_llm)
new_q = [query_distribution[1]] #select only MultiHopAbstractQuerySynthesizer
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings, knowledge_graph=loaded_kg, persona_list=personas)
testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)

Error trace

When raise_exceptions=False

Exception raised in Job[0]: ValueError(No clusters found in the knowledge graph. Try changing the relationship condition.)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[58], [line 1](vscode-notebook-cell:?execution_count=58&line=1)
----> [1](vscode-notebook-cell:?execution_count=58&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=False)
      [2](vscode-notebook-cell:?execution_count=58&line=2) testset.to_pandas()

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:434, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [432](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:432) additional_testset_info: t.List[t.Dict] = []
    [433](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:433) for i, (synthesizer, _) in enumerate(query_distribution):
--> [434](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:434)     for sample in scenario_sample_list[i]:
    [435](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:435)         exec.submit(
    [436](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:436)             synthesizer.generate_sample,
    [437](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:437)             scenario=sample,
    [438](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:438)             callbacks=sample_generation_grp,
    [439](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:439)         )
    [440](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:440)         # fill out the additional info for the TestsetSample

TypeError: 'float' object is not iterable

And when setting raise_exceptions=True

ValueError                                Traceback (most recent call last)
Cell In[52], [line 1](vscode-notebook-cell:?execution_count=52&line=1)
----> [1](vscode-notebook-cell:?execution_count=52&line=1) testset = generator.generate(testset_size=20, with_debugging_logs=True, query_distribution=new_q, raise_exceptions=True)
      [2](vscode-notebook-cell:?execution_count=52&line=2) testset.to_pandas()

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:413, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
    [412](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:412)     scenario_generation_rm.on_chain_error(e)
--> [413](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:413)     raise e
    [414](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:414) else:
    [415](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:415)     scenario_generation_rm.on_chain_end(
    [416](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:416)         outputs={"scenario_sample_list": scenario_sample_list}
    [417](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:417)     )

File c:\Users\khahmed\AppData\Local\miniforge3\envs\ragas_v.0\lib\site-packages\ragas\testset\synthesizers\generate.py:410, in TestsetGenerator.generate(self, testset_size, query_distribution, num_personas, run_config, batch_size, callbacks, token_usage_parser, with_debugging_logs, raise_exceptions)
    [401](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:401)     exec.submit(
    [402](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:402)         scenario.generate_scenarios,
    [403](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:403)         n=splits[i],
   (...)
    [406](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:406)         callbacks=scenario_generation_grp,
    [407](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:407)     )
    [409](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:409) try:
--> [410](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:410)     scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
    [411](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/generate.py:411) except Exception as e:
...
     [81](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:81)     )
     [82](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:82) num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
     [84](file:///C:/Users/khahmed/AppData/Local/miniforge3/envs/ragas_v.0/lib/site-packages/ragas/testset/synthesizers/multi_hop/abstract.py:84) for cluster in node_clusters:

ValueError: No clusters found in the knowledge graph. Try changing the relationship condition.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4) or open in a [text editor](command:workbench.action.openLargeOutput?8c5b5477-4f5c-42a9-abed-7adc4af9c9c4). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

Expected behavior

Test generation should produce samples successfully like it did in SingleHopSpecificQuerySynthesizer and MultiHopSpecificQuerySynthesizer

Additional context

My knowledge graph had the following structure:

KnowledgeGraph(nodes: 219, relationships: 794)

The relationships are JaccardSimilarityBuilder and OverlapScoreBuilder between the entities.