LLMEval - Project to run evaluations on a LLM

elbruno commented 7 months ago

The [llm-eval] folder includes:

a readme with a general description of how to setup and run the scenarios
a set of base class libraries to perform the evals, format output to console, json or csv
a set of generators and processors to work with QA or User Stories

jmatthiesen commented 7 months ago

Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?

One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?

Related - how do you think we'd use this to help evaluate custom prompts/chat backends?

elbruno commented 6 months ago

Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?

One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?

Related - how do you think we'd use this to help evaluate custom prompts/chat backends?

Got the point, and yes, for a detailed scenario / industry a tweak is needed. Similar to what @mahomedalid is doing now. A possible workaround will be add a 2 paragraph section about "How to create a new custom evaluator", that will include a custom prompt and a couple of C# lines of code to be used.

Thoughts on that?

LoganDavidTalbot commented 2 months ago

Hello, when is this getting merged? There is documentation that this linked to this no MS Learn:

https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai

codemillmatt commented 2 months ago

@jmatthiesen and @elbruno where are we on this one? Looks like we may be ready to go?

dotnet / ai-samples

LLMEval - Project to run evaluations on a LLM #49