Open elbruno opened 7 months ago
Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?
One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?
Related - how do you think we'd use this to help evaluate custom prompts/chat backends?
Thanks, @elbruno! Am I correct in thinking that for now this focuses on evaluating results with different models, but not necessarily evaluating custom prompts with these models?
One common flow is to have a set of source data that will be used with RAG in an application, and generating an initial set of ground truth data based on that - to help speed up getting started with curating ground truth data. How would I do that with this solution? For example, if I have insurance benefit documentation that I want to use in my app, how could I use this solution to create some initial ground truth data based on that documentation?
Related - how do you think we'd use this to help evaluate custom prompts/chat backends?
Got the point, and yes, for a detailed scenario / industry a tweak is needed. Similar to what @mahomedalid is doing now. A possible workaround will be add a 2 paragraph section about "How to create a new custom evaluator", that will include a custom prompt and a couple of C# lines of code to be used.
Thoughts on that?
Hello, when is this getting merged? There is documentation that this linked to this no MS Learn:
https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-approach-gen-ai
@jmatthiesen and @elbruno where are we on this one? Looks like we may be ready to go?
The [llm-eval] folder includes: