dotnet / ai-samples

MIT License
430 stars 79 forks source link

LLMEval - Project to run evaluations on a LLM #49

Open elbruno opened 7 months ago

elbruno commented 7 months ago

The [llm-eval] folder includes:

jmatthiesen commented 7 months ago

Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?

One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?

Related - how do you think we'd use this to help evaluate custom prompts/chat backends?

elbruno commented 6 months ago

Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?

One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?

Related - how do you think we'd use this to help evaluate custom prompts/chat backends?

Got the point, and yes, for a detailed scenario / industry a tweak is needed. Similar to what @mahomedalid is doing now. A possible workaround will be add a 2 paragraph section about "How to create a new custom evaluator", that will include a custom prompt and a couple of C# lines of code to be used.

Thoughts on that?

LoganDavidTalbot commented 2 months ago

Hello, when is this getting merged? There is documentation that this linked to this no MS Learn:

https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai

codemillmatt commented 2 months ago

@jmatthiesen and @elbruno where are we on this one? Looks like we may be ready to go?