Ensure EmbeddingSimilarity scores do not exceed 1

braintrustdata / autoevals

AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.

MIT License

199 stars 17 forks source link

Ensure EmbeddingSimilarity scores do not exceed 1 #95

Closed danielericlee closed 1 month ago

danielericlee commented 1 month ago

It turns out that when the output and expected are exactly the same, the cosine similarity util that we are using can return numbers slightly greater than 1.

I verified this by console.logging in the test suite. The test suite was happily calling 1.0000000009 close enough to 1.

So, I modified the scaleScore function used to massage the value returned by cossim to not return anything larger than 1.

I also updated the test to ensure we don't regress here.

github-actions[bot] commented 1 month ago

Braintrust eval report

Autoevals (main-1726010835) Score	Average	Improvements	Regressions
NumericDiff	73.5% (0pp)	-	-
Duration	1.54s (+0.03s)	23 🟢	77 🔴
Prompt_tokens	279.25s (+0s)	-	-
Completion_tokens	16.9s (+0s)	-	-
Total_tokens	296.14s (+0s)	-	-