(spike) Create a Custom RAG Evaluations Dataset

jalling97 commented 2 months ago

Description

In order to develop mission-focused evaluations, a custom RAG evaluations dataset is necessary. This dataset should be based on RAG tasks that are representative of current use cases.

Completion Criteria

[x] Identify sources of data that are representative of current use case scenarios.
[x] Build a dataset using this ground truth data.
[x] Build a draft dataset to be used for Needle in a Haystack evaluations.
[x] Refine the dataset with multiple reviewers (focusing on different perspectives of needs)

Relevant Links

Consider using DeepEval Synthesizer to generate a first draft of the dataset automatically.

alekst23 commented 1 month ago

It may be difficult to obtain a sufficiently large data set that is "mission specific". However, this is not strictly necessary for doing EVALs.

There are data sets available like this: https://www.nature.com/articles/s41597-023-02068-4

Abstract The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization.

jalling97 commented 1 month ago

Current draft qa dataset: https://huggingface.co/datasets/jalling/LFAI_RAG_qa_v1

jalling97 commented 1 month ago

Current draft niah dataset: https://huggingface.co/datasets/jalling/LFAI_RAG_niah_v1

defenseunicorns / leapfrogai