Closed pamelafox closed 1 month ago
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr120 |
---|---|---|---|
gpt_groundedness | pass_rate | 1.0 | 0.9 |
↑ | mean_rating | 5.0 | 4.6 |
gpt_relevance | pass_rate | 1.0 | 0.4 |
↑ | mean_rating | 5.0 | 2.8 |
answer_length | mean | 978.9 | 4223.4 |
latency | mean | 2.51 | 6.67 |
citation_match | rate | 1.0 | 1.0 |
num_questions | total | 10 | 10 |
/evaluate