dwadden / multivers

Code and model checkpoints for the MultiVerS model for scientific claim verification.
MIT License
44 stars 11 forks source link

The MultiVerS model

This is the repository for the MultiVerS model for scientific claim verification, described in the NAACL Findings 2022 paper MultiVerS: Improving scientific claim verification with weak supervision and full-document context.

MultiVers was formerly known as LongChecker. It's the exact same model; we just changed the name to emphasize different aspects of the modeling approach. I'm still in the process of changing the filenames within this repo.

We provide data, model checkpoints, training and inference code for models trained on three scientific claim verification datasets: SciFact, CovidFact, and HealthVer (see below for details). While the SciFact test set is not public, predictions made using the SciFact checkpoint will reproduce the results in the preprint and on the SciFact leaderboard.

Update (January 2023): Code and data to train the models are now available. Apologies for the delay.

Update (May 2022): Apologies for the delay in getting the training code up. I will make sure that it is available by the time the work is presented at NAACL 2022, if not sooner.

Disclaimer: This software is intended to be used as a research protype, and its outputs shouldn't be used to inform any medical decisions.

Outline

Setup

We recommend setting up a Conda environment:

conda create --name multivers python=3.8 conda-build

Then, install required packages:

pip install -r requirements.txt

Next, call conda develop . from the root of this repository.

Then, download the Longformer checkpoint from which all the fact-checking models are finetuned by doing

python script/get_checkpoint.py longformer_large_science

Running inference

Model checkpoints

The following model checkpoints are available. You can download them using script/get_checkpoint.sh.

You can also download all models by passing all to get_checkpoint.sh.

Evaluating predictions

The SciFact test set is private, but the test sets for HealthVer and CovidFact are included in the data download. To evaluate model predictions, use the scifact-evaluator code. Clone the repo, then use the evaluation script located at evaluator/eval.py. This script accepts two files:

  1. Predictions, as output by multivers/predict.py
  2. Gold labels, which are included in the data download.

It will evaluate the predictions with respect to gold and save metrics to a file. See the evaluation script for more details.

Making predictions for new datasets

You should be able to use one of the MultiVers checkpoints to make predictions for new data. First, you'll need to write a script to convert your dataset to the format described in data.md. Then, choose which model you'd like to use. If you don't know which one is best, we'd suggest:

Once you've got your model and dataset chosen, you can make predictions as follows:

    python multivers/predict.py \
        --checkpoint_path=checkpoints/[model_name].ckpt \
        --input_file=[path_to_your_claims] \
        --corpus_file=[path_to_your_corpus] \
        --output_file=[output_path]

Model training

Code is now available to train MultiVerS. See training.md for details.

GPT-3 baseline

I've added some code to do very un-optimized few-shot prediction using GPT-3. To run it, do bash script/predict_gpt3.sh. For info on the prompt used and the performance achieved, see gpt3_baseline.md.