Azure-Samples / ai-rag-chat-evaluator

Tools for evaluation of RAG Chat Apps using Azure AI Evaluate SDK and OpenAI
MIT License
209 stars 75 forks source link

Support/prioritize local prompt metrics #50

Closed pamelafox closed 7 months ago

pamelafox commented 7 months ago

Purpose

This PR adds support for local prompt metrics using Jinja2 templates and the evaluate SDK's PromptMetric.from_template functionality. It also refactors the metric definitions so they have consistently defined aggregation functions.

It makes the review tools more generic:

That does mean more scrollbars now.

Does this introduce a breaking change?

[ ] Yes
[X] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test