Closed pamelafox closed 1 month ago
/evaluate
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr111 |
---|---|---|---|
gpt_groundedness | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
gpt_relevance | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
answer_length | mean | 978.9 | 901.2 |
latency | mean | 2.51 | 2.2 |
citation_match | rate | 1.0 | 1.0 |
num_questions | total | 10 | 10 |
Purpose
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error? If you're not sure, try it out on an old environment.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest
).python -m pytest --cov
to verify 100% coverage of added linespython -m mypy
to check for type errorsruff
manually on my code.