Open malikbrh opened 1 week ago
After a bit more investigation, I've ran my code line by line and was able to do the following observations:
The first line of code I see not having an expected value is in the method flagged by my error trace: ragas.testset.synthesizers.multi_hop.abstract.MultiHopAbstractQuerySynthesizer._generate_scenarios
where my node_clusters list (first line of this method) is empty.
Below is a picture of one of my Relationship object in my Knowledge_graph, some of them have properties overlapped_items
and entities_overlap_score
, but none has summary_similarity
which would explain why the check True if rel.get_property("summary_similarity") else False
never gets any node_clusters.
So my question becomes the following:
Why would my Relationships not get any summary_similarity
property set?
Am I missing documents for my testset to be generated? The documentation is quite light at the moment, any help would be greatly appreciated !
Hey, the reason could be that the default summary similarity threshold might be higher for your docs. You may do either or both of the following.
1) modify and add your own transforms
2) Skip MultiHopAbstractQuerySynthesizer
query type by removing it from query_distribution parameter.
We know that docs for Testgen is lagging, and we are trying our best to improve.
Hey @shahules786 , First of all, thanks for your reply ! I'm still debugging my problems with testset generation and I finally ended up with the same conclusion than you, similarity threshold was too high for it to pass on my first documents so I changed them to new ones that passed this check. I am relatively new with RAG architecture, my bad.
But I still have a question, for testing purposes, I changed the default_filter
in ragas/testset/persona.py
By changing the return random.random() < 0.25 to True
, my testset finally got generated, my nodes were removed from the generate_personas_from_kg
method by this filter.
def default_filter(node: Node) -> bool:
if (
node.type.name == "DOCUMENT"
and node.properties.get("summary_embedding") is not None
):
return random.random() < 0.25
else:
return False
Any reason why this default_filter is coded this way? I understood the if conditions but couldn't find any explanation for the random. Thanks in advance !
Hey @malikbrh Great, amazed that you could debug it without much help from docs. Would love any contributions from you to improve ragas. To answer your question, the idea was to sample random summaries from given document set, cluster them and use one summary from each cluster (representative of the cluster) to estimate the persona that could interact with it. This feature is very new, and I'm sure it can be further improved. Feel free to share any thought ( consider joining our discord) Just added some docs for it: https://docs.ragas.io/en/latest/howtos/customizations/testgenerator/_persona_generator/?h=pe#personas-in-testset-generation
[X] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug I have a DivisionByZero error while generating my testset. The same code structure was working fine in v0.1, after migrating to 0.2 it broke. I have tried with multiple models, and finally stuck with OpenAI GPT4o-mini and text-embedding-3-small.
I added two documents to generate the testset, but it always fails at the same place. When exploring the KnowledgeGraph in the Debugger, it is fine with multiple Nodes generated by the previous steps.
Ragas version: v0.2.4 Python version: v3.11.10
Code to Reproduce Note: documents are llama_index documents ` kg = KnowledgeGraph()
Error trace
Expected behavior A clear and concise description of what you expected to happen.
I would expect the testset to be generated properly, or at least a more self-explainable error. I do not really know what would be the next best steps to debug.
Additional context Add any other context about the problem here. I can provide more stuff if needed, just ask me in comments and I'll see what I can post, as my project is supposed to stay confidential. Thanks in advance for your help !