JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering

This repo provides the source code & data of our paper: JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering (NAACL 2022).

For convenience, all data, checkpoints and codes can be downloaded from my Baidu Netdisk.

1. Dependencies

Run the following commands to create a conda environment (assuming CUDA11):

conda create -n jointlk python=3.7
source activate jointlk
pip install torch==1.7.1+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==3.2.0
pip install nltk spacy==2.1.6
python -m spacy download en
# for torch-geometric
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-scatter==2.0.6 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html
pip install torch-geometric==1.6.3 -f https://pytorch-geometric.com/whl/torch-1.7.1+cu110.html

See the file env.yaml for all environment dependencies.

2. Download Data

We use preprocessed data from the QA-GNN repository, which can also be downloaded from my Baidu Netdisk.

The data file structure will look like:

├── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    ├── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...
    ├── obqa/
    ├── medqa_usmle/
    └── ddb/

3. Training JointLK

(Assuming slurm job scheduling system)

For CommonsenseQA, run

sbatch sbatch_run_jointlk__csqa.sh

For OpenBookQA, run

sbatch sbatch_run_jointlk__obqa.sh

4. Pretrained model checkpoints


Trained model In-house Dev acc. In-house Test acc.
RoBERTa-large + JointLK [link] 77.6 75.3
RoBERTa-large + JointLK [link] 78.4 74.2


Trained model Dev acc. Test acc.
RoBERTa-large + JointLK [link] 68.8 70.4
AristoRoBERTa-large + JointLK [link] 79.2 85.6

5. Evaluating a pretrained model checkpoint

For CommonsenseQA, run

sbatch sbatch_run_jointlk__csqa_test.sh

For OpenBookQA, run

sbatch sbatch_run_jointlk__obqa_test.sh

6. Acknowledgment

This repo is built upon the following work:

QA-GNN: Question Answering using Language Models and Knowledge Graphs

Many thanks to the authors and developers!


We noticed that the QA-GNN repository added test results on the MedQA dataset. To facilitate future researchers to compare different models, we also test the performance of JointLK on MedQA.

For training MedQA, run

sbatch sbatch_run_jointlk__medqa_usmle.sh

for testing MedQA, run

sbatch sbatch_run_jointlk__medqa_usmle_test.sh

A pretrained model checkpoint

Trained model Dev acc. Test acc.
SapBERT-base + JointLK [link] 38.0 39.8