ADR: Model Evaluation Toolset

defenseunicorns / leapfrogai

Production-ready Generative AI for local, cloud native, airgap, and edge deployments.

Apache License 2.0

254 stars 29 forks source link

Based on our discussions today, it seems like this ADR is influenced by the following priorities: 1) A need to evaluate RAG specifically (near term) 2) A need to perform all other model evaluations (Summarization, Answer Relevancy, Contextual Precision, Hallucination, Bias, etc.) (not needed for minor release 0.8.0)

If this is correct, then is this ADR motivated primarily by priority 1? And priority 2, while considered, is less important? Or should the eval framework encapsulated in this ADR take both into account equally?

defenseunicorns / leapfrogai

ADR: Model Evaluation Toolset #194