Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.
With Empirical, you can
https://github.com/empirical-run/empirical/assets/284612/65d96ecc-12a2-474d-a81e-bbddb71106b6
Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.
Empirical relies on a configuration file, typically located at empiricalrc.js
which describes the test to run.
In this example, we will ask an LLM to extract entities from user messages and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become {name: 'Alice', location: 'Maryland'}
.
Our test will succeed if the model outputs valid JSON.
Use the CLI to create a sample configuration file called empiricalrc.js
.
npm init empiricalrun
# For TypeScript
npm init empiricalrun -- --using-ts
Run the example dataset against the selected models.
npx empiricalrun
This step requires the OPENAI_API_KEY
environment variable to
authenticate with OpenAI. This execution will cost $0.0026, based
on the selected models.
Use the ui
command to open the reporter web app and see side-by-side results.
npx empiricalrun ui
Edit the empiricalrc.js
file to make Empirical work for your use-case.
See development docs.