Closed pamelafox closed 1 month ago
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr105 |
---|---|---|---|
gpt_groundedness | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
gpt_relevance | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
answer_length | mean | 978.9 | 927.6 |
latency | mean | 2.51 | 2.07 |
citation_match | rate | 1.0 | 1.0 |
num_questions | total | 10 | 10 |
/evaluate