Closed jalling97 closed 2 months ago
Based on popular guides for setting up RAG datasets, most lean towards performing synthetic dataset generation, rather than sharing out specific datasets that contain questions, answers, and context. Going forward, it will likely make the most sense to make some sample datasets using the contextual documents and generating them. Some potential options include:
For Needle In a Haystack (NIAH) evaluations, we can leverage some existing datasets for this, to potentially include:
These proved to be pretty good, but I think it would be more advantageous to generate some simple examples instead.
Final determination:
Of the open source datasets available, it was found to be more valuable to create a few datasets from scratch, so no open source datasets will be used for RAG evaluations specifically.
Description
We need to choose a small number (1-3, depending on size) of open source RAG evaluation datasets. Having at least 1 open source dataset allows us to begin running basic evaluations that allow us to begin iterating on improvements.
Relevant Links
Many datasets can be found on HuggingFace