Iterative Memory based Joint OpenIE
A BERT-based OpenIE system that generates extraction using an iterative Seq2Seq model, as described in the following publication, in ACL 2020, link
Use a python-3.6 environment and install the dependencies using,
pip install -r requirements.txt
This will install custom versions of allennlp and pytorch_transformers based on the code in the folder.
All reported results are based on pytorch-1.2 run on a TeslaV100 GPU (CUDA 10.0). Results may vary slightly with change in environment.
bash download_data.sh
This downloads the (train, dev, test) data
IMoJIE (on OpenIE-4, ClausIE, RnnOIE bootstrapping data with QPBO filtering)
python allennlp_script.py --param_path imojie/configs/imojie.json --s models/imojie --mode train_test
Arguments:
Important baselines:
IMoJIE (on OpenIE-4 bootstrapping)
python allennlp_script.py --param_path imojie/configs/ba.json --s models/ba --mode train_test
CopyAttn+BERT (on OpenIE-4 bootstrapping)
python allennlp_script.py --param_path imojie/configs/be.json --s models/be --mode train_test --type single --beam_size 3
Score using bert_encoder trained on oie4:
python imojie/aggregate/score.py --model_dir models/score/be --inp_fp data/train/4cr_comb_extractions.tsv --out_fp data/train/4cr_comb_extractions.tsv.be
Score using bert_append trained on comb_4cr (random):
python imojie/aggregate/score.py --model_dir models/score/4cr_rand --inp_fp data/train/4cr_comb_extractions.tsv.be --out_fp data/train/4cr_comb_extractions.tsv.ba
Filter using QPBO:
python imojie/aggregate/filter.py --inp_fp data/train/4cr_comb_extractions.tsv.ba --out_fp data/train/4cr_qpbo_extractions.tsv
We have been internally calling our model as "bert-append" (ba) until the day of submission of the paper and CopyAttention + BERT as "bert-encoder" (be). So you will find similar references throughout the code-base. In this context, IMoJIE is bert-append trained on qpbo filtered data.
Format: (Prec/Rec/F1-Optimal, AUC, Prec/Rec/F1-Last)
models/imojie/test/carb_1/best_results.txt \ (64.70/45.60/53.50, 33.30, 63.80/45.80/53.30)
models/ba/test/carb_1/best_results.txt \ (Prec/Rec/F1-Optimal, AUC, Prec/Rec/F1-Last) \ (63.50/45.80/53.20, 33.10, 60.40/46.30/52.40)
models/be/test/carb_3/best_results.txt \ (Prec/Rec/F1-Optimal, AUC, Prec/Rec/F1-Last) \ (59.50/45.50/51.60, 32.80, 52.90/46.70/49.60)
Downloading the pre-trained models:
zenodo_get 3779954
Downloading the data:
zenodo_get 3775983
Downloading the results:
zenodo_get 3780045
python standalone.py --inp input.txt --out output.txt
input.txt contains one sentence in each line output.txt contains the corresponding OpenIE extractions
This requires downloading the pre-trained models
If you use this code in your research, please cite:
@inproceedings{kolluru&al20,
title = "{IM}o{JIE}: {I}terative {M}emory-{B}ased {J}oint {O}pen {I}nformation {E}xtraction",
author = "Kolluru, Keshav and
Aggarwal, Samarth and
Rathore, Vipul and
Mausam, and
Chakrabarti, Soumen",
booktitle = "The 58th Annual Meeting of the Association for Computational Linguistics (ACL)",
month = July,
year = "2020",
address = {Seattle, U.S.A}
}
In case of any issues, please send a mail to
keshav.kolluru (at) gmail (dot) com