Closed danielericlee closed 1 month ago
Autoevals (main-1726010835) Score | Average | Improvements | Regressions |
---|---|---|---|
NumericDiff | 73.5% (0pp) | - | - |
Duration | 1.54s (+0.03s) | 23 🟢 | 77 🔴 |
Prompt_tokens | 279.25s (+0s) | - | - |
Completion_tokens | 16.9s (+0s) | - | - |
Total_tokens | 296.14s (+0s) | - | - |
It turns out that when the
output
andexpected
are exactly the same, the cosine similarity util that we are using can return numbers slightly greater than 1.I verified this by console.logging in the test suite. The test suite was happily calling
1.0000000009
close enough to1
.So, I modified the
scaleScore
function used to massage the value returned bycossim
to not return anything larger than 1.I also updated the test to ensure we don't regress here.