empirical-run / empirical

Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
https://docs.empirical.run
MIT License
141 stars 10 forks source link

fix: tool dataset #211

Closed saikatmitra91 closed 2 months ago

changeset-bot[bot] commented 2 months ago

⚠️ No Changeset found

Latest commit: 8711ddb4e6a4e5b8dec2344f74393673d96efa4b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions[bot] commented 2 months ago

Empirical Run Summary

Run #8ade: gpt-3.5-turbo Run #791e: gpt-4-turbo-preview
Outputs 100% 100%
Scores
json-syntax 100% 100%
Avg latency 981ms 1680ms

Total dataset samples: 2