defenseunicorns / leapfrogai

Production-ready Generative AI for local, cloud native, airgap, and edge deployments.
https://leapfrog.ai
Apache License 2.0
254 stars 29 forks source link

ADR: Model Evaluation Toolset #194

Open gphorvath opened 1 year ago

jalling97 commented 5 months ago

Based on our discussions today, it seems like this ADR is influenced by the following priorities: 1) A need to evaluate RAG specifically (near term) 2) A need to perform all other model evaluations (Summarization, Answer Relevancy, Contextual Precision, Hallucination, Bias, etc.) (not needed for minor release 0.8.0)

If this is correct, then is this ADR motivated primarily by priority 1? And priority 2, while considered, is less important? Or should the eval framework encapsulated in this ADR take both into account equally?

jalling97 commented 5 months ago

Since the model evaluation part (priority 2 as listed above) is a separate issue from RAG evaluation (while they can leverage the same framework, they do not have to), this issue will be removed from the 0.8.0 milestone and issue #196 will be brought into this milestone.