This is the repo for GRAPHENE (Ensembling Graph Predictions for AMR Parsing), a novel approach to ensemble semantic parsing and generation, to be presented at NeurIPS 2021. A preprint of the paper can be found at the following location on arxiv.
With GRAPHENE you can perform state-of-the-art Text-to-AMR ensemble parsing. If you find it useful please star our github repo and cite our work using the following bib file:
@inproceedings{graphene,
title={Ensembling Graph Predictions for AMR Parsing},
author={Hoang Thanh Lam and Gabriele Picco and Yufang Hou and Young-Suk Lee and Lam M. Nguyen and Dzung T. Phan and Vanessa López and Ramon Fernandez Astudillo},
year={2021},
booktitle = {Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual}
}
In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation graphs (AMR). On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than any individual prediction. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, the problem of ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest subgraph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of the state-of-the-art AMR parsers to create new predictions that are more accurate than any individual model in five standard benchmark datasets.
pip install -r requirements.txt
In order to launch GRAPHENE it is necessary to obtain predictions from a set of models. In the paper, the models we used are:
Obtain the AMR2.0 and AMR3.0 data set: https://catalog.ldc.upenn.edu/LDC2017T10 and https://catalog.ldc.upenn.edu/LDC2020T02
Obtain BIO and LP AMR data set following: https://amr.isi.edu/download.html
Preprocess the data to remove wiki tags. Wiki tags point to reference in Wikipedia, this models perform wikification using Blink as a postprocessing step.
The following command shows how to preprocess AMR 2.0. It can be performed in a similar way on the other datasets.
python -u -m amr_utils.preprocess.preprocess_amr -i LDC2017T10/data/amrs/split \
-o LDC2017T10/preprocessed_data/
Train T5 models and obtaining predictions
python -u -m amr_parsing.t5.cli.train --train "./LDC2017T10/preprocessed_data/train.txt.features.nowiki" \
--validation ./LDC2017T10/preprocessed_data/dev.txt.features.nowiki \
--report_test ./LDC2017T10/preprocessed_data/test.txt.features.nowiki \
--max_source_length 512 --max_target_length 512 --batch 8 -e 30 -m t5-large \
--model_type t5 --output ./t5_amr/ --data_type "amrdata" \
--task_type "text2amr" --val_from_epoch 10
Scoring
python -u -m amr_parsing.cli.parser --test LDC2017T10/preprocessed_data/test.txt.features.nowiki \
--max_source_length 512 --max_target_length 512 --batch 4 -m t5-large --model_type t5 \
--output LDC2017T10/preprocessed_data/t5_amr_prediction.txt --data_type "amrdata" --task_type "text2amr" \
--checkpoint t5_amr/multitask.model
Wikification
To reproduce our results, you will also need need to run the BLINK entity linking system on the prediction file. To do so, you will need to install BLINK, and download their models:
git clone https://github.com/facebookresearch/BLINK.git
cd BLINK
pip install -r requirements.txt
pip install -e .
sh download_blink_models.sh
cd models
wget http://dl.fbaipublicfiles.com/BLINK//faiss_flat_index.pkl
cd ../..
Then, you will be able to launch the run_blink_wiki_adder.py
script:
python -u -m amr_utils.blinkify.run_blink_wiki_adder.py \
-i LDC2017T10/preprocessed_data/t5_amr_prediction.txt \
-o LDC2017T10/preprocessed_data/ \
--blink-models-dir ../BLINK/models/
The output file with wikifications will be written to the output folder.
For Spring, ATP and Cai&Lam we used the pretrained available checkpoints.
python -u -m ensemble.graphene --gold gold_amr.txt --data "prediction_model_1 prediction_model_2 ...."
The results will be written to the default output file graphene_smatch.txt in the same folder where the command is run. To choose which algorithms, we can specify the algorithm flag with the following options:
To compute comparable Smatch scores you will also need to use the scripts available at https://github.com/mdtux89/amr-evaluation. It is important to notice that the results collected using this script is about 0.3 points worse than the results using Smatch 1.0.4. Following https://github.com/SapienzaNLP/spring, the results reported in our paper are based on https://github.com/mdtux89/amr-evaluation, instead of Smatch 1.0.4.
The script provided in https://github.com/mdtux89/amr-evaluation require the files to be aligned (AMR provided in the prediction and gold files must be in the same order). We provide an utility for aligning two amrs file based on the ::id tags:
python -u -m ensemble.align -g gold_amrs.txt -p predictions_amrs.txt