braintrustdata / autoevals

AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.
MIT License
199 stars 17 forks source link

Ensure EmbeddingSimilarity scores do not exceed 1 #95

Closed danielericlee closed 1 month ago

danielericlee commented 1 month ago

It turns out that when the output and expected are exactly the same, the cosine similarity util that we are using can return numbers slightly greater than 1.

I verified this by console.logging in the test suite. The test suite was happily calling 1.0000000009 close enough to 1.

image

So, I modified the scaleScore function used to massage the value returned by cossim to not return anything larger than 1.

I also updated the test to ensure we don't regress here.

github-actions[bot] commented 1 month ago

Braintrust eval report

Autoevals (main-1726010835) Score Average Improvements Regressions
NumericDiff 73.5% (0pp) - -
Duration 1.54s (+0.03s) 23 🟢 77 🔴
Prompt_tokens 279.25s (+0s) - -
Completion_tokens 16.9s (+0s) - -
Total_tokens 296.14s (+0s) - -