Closed pamelafox closed 1 month ago
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr124 |
---|---|---|---|
gpt_groundedness | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
gpt_relevance | pass_rate | 1.0 | 1.0 |
↑ | mean_rating | 5.0 | 5.0 |
answer_length | mean | 1017.6 | 995.8 |
latency | mean | 2.56 | 1.94 |
citations_matched | mean | 0.73 | 0.73 |
/evaluate