Closed reneenoble closed 1 month ago
/evaluate
Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results.
metric | stat | baseline | pr113 |
---|---|---|---|
gpt_groundedness | pass_rate | 1.0 | 0.0 |
↑ | mean_rating | 5.0 | 1.0 |
gpt_relevance | pass_rate | 1.0 | 0.0 |
↑ | mean_rating | 5.0 | 1.0 |
answer_length | mean | 978.9 | 230.0 |
latency | mean | 2.51 | 1.23 |
citation_match | rate | 1.0 | 1.0 |
num_questions | total | 10 | 10 |
Update prompt info for system.