Provide a model/prompt evaluation building block for .NET devs

JakeRadMSFT commented 3 months ago

Today, the Python Evaluation building block can be used against a .NET backend that uses the Chat Protocol (Azure Search supports this). However, we know from customer feedback (A top request from MVPs) that they want a .NET native solution for this. The goal is to deliver .NET-specific building block documentation and sample code to support this. Currently this is still in a design phase, but the latest thinking is that we may adapt a community DotNet LLM Eval sample that was built by a fellow Microsoft employee (Maho Pacheco).

Work to be done: • Evaluate options for LLM evaluation and land on the best practice guidance we’d give to devs. Decide if we’ll lead with the Python implementation for Build. • Continue investigating what is needed for a .NET native evaluation block • If continuing with adapting the DotNet LLM Eval sample, we’d need to: o Likely provide an analysis UI that isn’t Grafana, for now aiming for parity with Pamela’s output. o Figure out a solution for generating questions and answers to use as “grounded truth” – currently the Azure AI synthetics library, used by the Python block, does not have a .NET equivalent. o Update to Support Semantic Kernel 1.x. • Create guidance documentation

JakeRadMSFT commented 3 months ago

https://github.com/Azure-Samples/azure-search-openai-demo-csharp/pull/299

jmatthiesen commented 3 months ago

Assigned to @elbruno who's looking into this building block & next steps.

dotnet / ai-samples

Provide a model/prompt evaluation building block for .NET devs #36