FullFact / health-misinfo-shared

Raphael health misinformation project, shared by Full Fact and Google
MIT License
0 stars 0 forks source link

Make paraphrasing of claims stay close to the transcript #37

Closed dcorney closed 2 months ago

dcorney commented 3 months ago

Overview

Currently, when we ask Gemini to identify and extract claims, it paraphrases them. This is good because it improves the readability of the claims, in contrast to the raw transcript. Part of this is to make the claims standalone without needing extra context.

However, Gemini also tries to be helpful by adding extra context that isn't in the transcript. In some cases, this can change the meaning quite significantly.

E.g. if a transcript says "i love carrots you know they're so crunchy carrots make you see in the dark", Gemini may summarise this as "Carrots are good for night vision because they're rich in vitamin A".

Requirements

dcorney commented 2 months ago

Evaluation: I took c.200 extracted claims and compared them to the source transcripts https://docs.google.com/spreadsheets/d/1DgvkLrLHfZeHJMB4kgZ2Y2ezsrecE9u1DwVrI-5ZKcs/edit#gid=82981182 About 95% were perfectly correct; some others hallucinated claims that actually appeared elsewhere in the same video; one hallucination was not in the video at all, but was 'correct'.

So no misleading claims were generated by our use of Gemini.