DreamInvoker / GAIN

Source code for EMNLP 2020 paper: Double Graph Based Reasoning for Document-level Relation Extraction
MIT License
142 stars 31 forks source link
dgl document-level-relation-extraction graph-neural-networks natural-language-processing relation-extraction

Double Graph Based Reasoning for Document-level Relation Extraction

Source code for EMNLP 2020 paper: Double Graph Based Reasoning for Document-level Relation Extraction

Document-level relation extraction aims to extract relations among entities within a document. Different from sentence-level relation extraction, it requires reasoning over multiple sentences across a document. In this paper, we propose Graph Aggregation-and-Inference Network (GAIN) featuring double graphs. GAIN first constructs a heterogeneous mention-level graph (hMG) to model complex interaction among different mentions across the document. It also constructs an entity-level graph (EG), based on which we propose a novel path reasoning mechanism to infer relations between entities. Experiments on the public dataset, DocRED, show GAIN achieves a significant performance improvement (2.85 on F1) over the previous state-of-the-art.

1. Environments

2. Dependencies

PS: dgl >= 0.5 is not compatible with our code, we will fix this compatibility problem in the future.

3. Preparation

3.1. Dataset

3.2. (Optional) Pre-trained Language Models

Following the hint in this link, download possible required files (pytorch_model.bin, config.json, vocab.txt, etc.) into the directory PLM/bert-????-uncased such as PLM/bert-base-uncased.

4. Training

>> cd code
>> ./runXXX.sh gpu_id   # like ./run_GAIN_BERT.sh 2
>> tail -f -n 2000 logs/train_xxx.log

5. Evaluation

>> cd code
>> ./evalXXX.sh gpu_id threshold(optional)  # like ./eval_GAIN_BERT.sh 0 0.5521
>> tail -f -n 2000 logs/test_xxx.log

PS: we recommend to use threshold = -1 (which is the default, you can omit this arguments at this time) for dev set, the log will print the optimal threshold in dev set, and you can use this optimal value as threshold to evaluate test set.

6. Submission to LeadBoard (CodaLab)

7. License

This project is licensed under the MIT License - see the LICENSE file for details.

8. Citation

If you use this work or code, please kindly cite the following paper:

@inproceedings{zeng-etal-2020-gain,
    title = "Double Graph Based Reasoning for Document-level Relation Extraction",
    author = "Zeng, Shuang  and
      Xu, Runxin  and
      Chang, Baobao  and
      Li, Lei",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.127",
    pages = "1630--1640",
}

9. Contacts

If you have any questions, please feel free to contact Shuang Zeng, we will reply it as soon as possible.