explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
5.7k stars 533 forks source link

Repetitive Data in Generated Synthetic Dataset #809

Open thamasha24 opened 3 months ago

thamasha24 commented 3 months ago

When creating a synthetic dataset from documents, many of the questions and answers end up being the same. (Out of 20 questions and answers, around 15 are identical.)

Nandakishore-Thekkadathu commented 1 month ago

I am also having this same issue!!

jjmachan commented 1 month ago

hey @thamasha24 and @Nandakishore-Thekkadathu, how many documents are you passing into the module for generation?

if there are not enough nodes then it might show this behaviour