Make paraphrasing of claims stay close to the transcript

Overview

Currently, when we ask Gemini to identify and extract claims, it paraphrases them. This is good because it improves the readability of the claims, in contrast to the raw transcript. Part of this is to make the claims standalone without needing extra context.

However, Gemini also tries to be helpful by adding extra context that isn't in the transcript. In some cases, this can change the meaning quite significantly.

E.g. if a transcript says "i love carrots you know they're so crunchy carrots make you see in the dark", Gemini may summarise this as "Carrots are good for night vision because they're rich in vitamin A".

Requirements

[x] Evaluate current performance: generate a spreadsheet of extracted claims & original transcript and count how often the claim is 'wrong' (i.e. where the meaning of the claim has changed in a non-trivial way from the transcript)
[ ] If necessary, experiment with prompts to encourage Gemini to stick closer to the transcript: the transcript should only be edited for readability, not for meaning; any context added to the claim should come from the input text (i.e. the chunk)

FullFact / health-misinfo-shared

Make paraphrasing of claims stay close to the transcript #37

Overview

Requirements