empirical-run / empirical

Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
https://docs.empirical.run
MIT License
141 stars 10 forks source link
evaluation-framework llm llm-inference llmops test-automation testing testing-framework

Empirical

npm Discord

Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.

With Empirical, you can

https://github.com/empirical-run/empirical/assets/284612/65d96ecc-12a2-474d-a81e-bbddb71106b6

Usage

See all docs →

Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.

Empirical relies on a configuration file, typically located at empiricalrc.js which describes the test to run.

Start with a basic example

In this example, we will ask an LLM to extract entities from user messages and give us a structured JSON output. For example, "I'm Alice from Maryland" will become {name: 'Alice', location: 'Maryland'}.

Our test will succeed if the model outputs valid JSON.

  1. Use the CLI to create a sample configuration file called empiricalrc.js.

    npm init empiricalrun
    
    # For TypeScript
    npm init empiricalrun -- --using-ts
  2. Run the example dataset against the selected models.

    npx empiricalrun

    This step requires the OPENAI_API_KEY environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.

  3. Use the ui command to open the reporter web app and see side-by-side results.

    npx empiricalrun ui

Make it yours

Edit the empiricalrc.js file to make Empirical work for your use-case.

Contribution guide

See development docs.