defenseunicorns / leapfrogai

Production-ready Generative AI for local, cloud native, airgap, and edge deployments.
https://leapfrog.ai
Apache License 2.0
250 stars 26 forks source link

(spike) Create a Custom RAG Evaluations Dataset #716

Closed jalling97 closed 1 month ago

jalling97 commented 2 months ago

Description

In order to develop mission-focused evaluations, a custom RAG evaluations dataset is necessary. This dataset should be based on RAG tasks that are representative of current use cases.

Completion Criteria

Relevant Links

Consider using DeepEval Synthesizer to generate a first draft of the dataset automatically.

alekst23 commented 1 month ago

It may be difficult to obtain a sufficiently large data set that is "mission specific". However, this is not strictly necessary for doing EVALs.

There are data sets available like this: https://www.nature.com/articles/s41597-023-02068-4

Abstract The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization.

jalling97 commented 1 month ago

Current draft qa dataset: https://huggingface.co/datasets/jalling/LFAI_RAG_qa_v1

jalling97 commented 1 month ago

Current draft niah dataset: https://huggingface.co/datasets/jalling/LFAI_RAG_niah_v1