Closed pamelafox closed 1 month ago
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr119 |
---|---|---|---|
gpt_groundedness | mean_rating | 5.0 | 5.0 |
↑ | pass_rate | 1.0 | 1.0 |
gpt_relevance | mean_rating | 5.0 | 5.0 |
↑ | pass_rate | 1.0 | 1.0 |
answer_length | mean | 978.9 | 925.7 |
latency | mean | 2.51 | 2.11 |
citation_match | rate | 1.0 | 1.0 |
num_questions | total | 10 | 10 |
/evaluate