FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

Write an evaluation script using promptfoo #73

Closed dearden closed 2 months ago

dearden commented 2 months ago

Overview

We've done some research into some GenAI evaluation tools (#38), but the scripts written so far have been toy examples.

We should write a script that compares prompts on representative data, which we can reuse for evaluating new prompts and new models.

Requirements

  1. A script (or scripts) which allow us to run at least one prompt through promptfoo and get some usable metrics back
  2. The process should be automatic and repeatable
  3. Document the process, and also what it doesn't tell us