Use fact-checker annotated data for in-context learning

Overview

We have recently asked Full Fact's health fact checkers to annotate some claims.

We now want to use that data for in-context learning (meaning the training data is put in the prompt for few-shot/many-shot learning). (We might later also use it to fine-tune a model and use that for inference.)

Requirements

[x] write scripts to load in the annotated data. Some fields may have been left blank and should be filled with appropriate default values. E.g. if the 'understandability' is 'vague' and the remaining flelds have not been annotated, then fill them with "not a claim", "not medical", "can't tell" and "can't tell". The resulting filtered data should be saved in a suitable format (prob. JSON)
[x] write function to load in this filtered JSON and use it to construct a prompt

Notes and additional information

We'll also want to do some evaluation. The simplest approach might be to split the annotated set and use part for in-context learning and th rest for evaluation.

We'll also start with multiple CSV files for annotations - one per annotator. Probably best to keep these separate (e.g. so we can add more later), but merge into one big JSON file for use in the actual prompt.

FullFact / health-misinfo-shared