Empirical

Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.

With Empirical, you can

Run your test datasets locally against off-the-shelf or custom models
Compare model outputs on a web UI, and test changes quickly
Score your outputs with scoring functions
Run tests on CI/CD

https://github.com/empirical-run/empirical/assets/284612/65d96ecc-12a2-474d-a81e-bbddb71106b6

Usage

See all docs →

Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.

Empirical relies on a configuration file, typically located at empiricalrc.js which describes the test to run.

Start with a basic example

In this example, we will ask an LLM to extract entities from user messages and give us a structured JSON output. For example, "I'm Alice from Maryland" will become {name: 'Alice', location: 'Maryland'}.

Our test will succeed if the model outputs valid JSON.

Use the CLI to create a sample configuration file called empiricalrc.js.

npm init empiricalrun

# For TypeScript
npm init empiricalrun -- --using-ts

Run the example dataset against the selected models.
```
npx empiricalrun
```
This step requires the OPENAI_API_KEY environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.
Use the ui command to open the reporter web app and see side-by-side results.
```
npx empiricalrun ui
```

Make it yours

Edit the empiricalrc.js file to make Empirical work for your use-case.

Configure which models to use
Configure your test dataset
Configure scoring functions to grade output quality

Contribution guide

See development docs.

empirical-run / empirical

readme

Empirical

Usage

Start with a basic example

Make it yours

Contribution guide