Today, the Python Evaluation building block can be used against a .NET backend that uses the Chat Protocol (Azure Search supports this). However, we know from customer feedback (A top request from MVPs) that they want a .NET native solution for this. The goal is to deliver .NET-specific building block documentation and sample code to support this. Currently this is still in a design phase, but the latest thinking is that we may adapt a community DotNet LLM Eval sample that was built by a fellow Microsoft employee (Maho Pacheco).
Work to be done:
• Evaluate options for LLM evaluation and land on the best practice guidance we’d give to devs. Decide if we’ll lead with the Python implementation for Build.
• Continue investigating what is needed for a .NET native evaluation block
• If continuing with adapting the DotNet LLM Eval sample, we’d need to:
o Likely provide an analysis UI that isn’t Grafana, for now aiming for parity with Pamela’s output.
o Figure out a solution for generating questions and answers to use as “grounded truth” – currently the Azure AI synthetics library, used by the Python block, does not have a .NET equivalent.
o Update to Support Semantic Kernel 1.x.
• Create guidance documentation
Today, the Python Evaluation building block can be used against a .NET backend that uses the Chat Protocol (Azure Search supports this). However, we know from customer feedback (A top request from MVPs) that they want a .NET native solution for this. The goal is to deliver .NET-specific building block documentation and sample code to support this. Currently this is still in a design phase, but the latest thinking is that we may adapt a community DotNet LLM Eval sample that was built by a fellow Microsoft employee (Maho Pacheco).
Work to be done: • Evaluate options for LLM evaluation and land on the best practice guidance we’d give to devs. Decide if we’ll lead with the Python implementation for Build. • Continue investigating what is needed for a .NET native evaluation block • If continuing with adapting the DotNet LLM Eval sample, we’d need to: o Likely provide an analysis UI that isn’t Grafana, for now aiming for parity with Pamela’s output. o Figure out a solution for generating questions and answers to use as “grounded truth” – currently the Azure AI synthetics library, used by the Python block, does not have a .NET equivalent. o Update to Support Semantic Kernel 1.x. • Create guidance documentation