The MultiVerS model

This is the repository for the MultiVerS model for scientific claim verification, described in the NAACL Findings 2022 paper MultiVerS: Improving scientific claim verification with weak supervision and full-document context.

MultiVers was formerly known as LongChecker. It's the exact same model; we just changed the name to emphasize different aspects of the modeling approach. I'm still in the process of changing the filenames within this repo.

We provide data, model checkpoints, training and inference code for models trained on three scientific claim verification datasets: SciFact, CovidFact, and HealthVer (see below for details). While the SciFact test set is not public, predictions made using the SciFact checkpoint will reproduce the results in the preprint and on the SciFact leaderboard.

Update (January 2023): Code and data to train the models are now available. Apologies for the delay.

Update (May 2022): Apologies for the delay in getting the training code up. I will make sure that it is available by the time the work is presented at NAACL 2022, if not sooner.

Disclaimer: This software is intended to be used as a research protype, and its outputs shouldn't be used to inform any medical decisions.

Setup

We recommend setting up a Conda environment:

conda create --name multivers python=3.8 conda-build

Then, install required packages:

pip install -r requirements.txt

Next, call conda develop . from the root of this repository.

Then, download the Longformer checkpoint from which all the fact-checking models are finetuned by doing

python script/get_checkpoint.py longformer_large_science

Running inference

First, download the processed versions of the data by running bash script/get_data.sh. This will download the CovidFact, HealthVer, and SciFact datasets into the data directory.
Then, download the model checkpoint you'd like to make predictions with using
```
python script/get_checkpoint.py [checkpoint_name]
```
Available models are listed in model checkpoints section.
Make predictions using the convenience wrapper script script/predict.sh. This script accepts a dataset name as an argument, and makes predictions using the correct inputs files and model checkpoints for that dataset. For instance, to make predictions on the SciFact test set using the version of MultiVerS trained on Scifact, do:
```
bash script/predict.sh scifact
```
For more control over the models and datasets used for prediction, you can use multivers/predict.py.

Model checkpoints

The following model checkpoints are available. You can download them using script/get_checkpoint.sh.

fever: MultiVerS trained on FEVER.
fever_sci: MultiVerS trained on FEVER, plus two weakly-supervised scientific datasets: PubMedQA and Evidence Inference.
covidfact: Finetuned on CovidFact, starting from the fever_sci checkpoint.
healthver: Finetuned on HealthVer.
scifact: Finetuned on SciFact.
longformer_large_science: Longformer pre-trained on a corpus of scientific documents. This model has not been trained on any fact-checking data; it's the starting point for all other models.

You can also download all models by passing all to get_checkpoint.sh.

Evaluating predictions

The SciFact test set is private, but the test sets for HealthVer and CovidFact are included in the data download. To evaluate model predictions, use the scifact-evaluator code. Clone the repo, then use the evaluation script located at evaluator/eval.py. This script accepts two files:

Predictions, as output by multivers/predict.py
Gold labels, which are included in the data download.

It will evaluate the predictions with respect to gold and save metrics to a file. See the evaluation script for more details.

Making predictions for new datasets

You should be able to use one of the MultiVers checkpoints to make predictions for new data. First, you'll need to write a script to convert your dataset to the format described in data.md. Then, choose which model you'd like to use. If you don't know which one is best, we'd suggest:

fever for Wikipedia or general text.
healthver for claims specifically about COVID-19.
scifact for biomedical claims generally.

Once you've got your model and dataset chosen, you can make predictions as follows:

    python multivers/predict.py \
        --checkpoint_path=checkpoints/[model_name].ckpt \
        --input_file=[path_to_your_claims] \
        --corpus_file=[path_to_your_corpus] \
        --output_file=[output_path]

Model training

Code is now available to train MultiVerS. See training.md for details.

GPT-3 baseline

I've added some code to do very un-optimized few-shot prediction using GPT-3. To run it, do bash script/predict_gpt3.sh. For info on the prompt used and the performance achieved, see gpt3_baseline.md.

dwadden / multivers

readme