Overtrained / contextual-qa-chat-app

0 stars 0 forks source link

Explore LLM Evaluation with LangSmith #19

Open 907Resident opened 9 months ago

907Resident commented 9 months ago

Context:

Conducting evaluations from the output of RAG models is more and more important as these models continued to be developed and deployed. Although there is not a true standard in evaluating RAG model output, the work associated with this issue is an attempt to provide a workflow to evaluate output in a programmatic fashion.

Objective:

Demonstrate how to evaluate RAG model output using annotated questions and answers. Additionally, highlight the ability for various RAG models to respond to known question and answers. The number of known questions and answers can be small but at least 10.

Path to Completion:

References: