FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

evaluation should support calculated summaries #154

Closed c-j-johnston closed 3 months ago

c-j-johnston commented 3 months ago

Overview

In #91, we changed how summary labels for claims were generated -- instead of the LLM determining them, they are now calculated with a scoring system.

We did not at the time update promptfoo evaluation to support this change, so the evaluation still expects the summary to already be in the labels dict.

We should add the summary as soon as the LLM inference is complete.

Requirements

  1. The function to generate a claim summary should be run on claims as soon as they are run through the LLM, in promptfoo evaluation.
  2. The labelled data should also have this label added when it is loaded in.

Notes and additional information