Open gphorvath opened 1 year ago
Since the model evaluation part (priority 2 as listed above) is a separate issue from RAG evaluation (while they can leverage the same framework, they do not have to), this issue will be removed from the 0.8.0 milestone and issue #196 will be brought into this milestone.
Based on our discussions today, it seems like this ADR is influenced by the following priorities: 1) A need to evaluate RAG specifically (near term) 2) A need to perform all other model evaluations (Summarization, Answer Relevancy, Contextual Precision, Hallucination, Bias, etc.) (not needed for minor release 0.8.0)
If this is correct, then is this ADR motivated primarily by priority 1? And priority 2, while considered, is less important? Or should the eval framework encapsulated in this ADR take both into account equally?