ChunchuanLv / AMR_AS_GRAPH_PREDICTION

53 stars 16 forks source link

AMR AS GRAPH PREDICTION

This repository contains code for training and using the Abstract Meaning Representation model described in: AMR Parsing as Graph Prediction with Latent Alignment

If you use our code, please cite our paper as follows:

@inproceedings{Lyu2018AMRPA,
    title={AMR Parsing as Graph Prediction with Latent Alignment},
    author={Chunchuan Lyu and Ivan Titov},
    booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics},
    year={2018}
}

Prerequisites

Configuration

Preprocessing

Either a) combine all *.txt files into a single one, and use Stanford CoreNLP to extract ner, pos and lemma. Processed file saved in the same folder.

python src/preprocessing.py

or b) process from AMR-to-English aligner using java script in AMR_FEATURE (I used Eclipse to run it).

Build the copying dictionary and recategorization system (can skip as they are in data/).

python src/rule_system_build.py

Build data into tensor.

python src/data_build.py

Training

Default model is saved in [save_to]/gpus_0valid_best.pt . (save_to is defined in constants.py)

python src/train.py

Testing

Load model to parse from pre-build data.

python src/generate.py -train_from [gpus_0valid_best.pt]

Evaluation

Please use amr-evaluation-tool-enhanced. This is based on Marco Damonte's amr-evaluation-tool But with correction concerning unlabeled edge score.

Parsing

Either a) parse a file where each line consists of a single sentence, output saved at [file]_parsed

python src/parse.py -train_from [gpus_0valid_best.pt] -input [file]

or b) parse a sentence where each line consists of a single sentence, output saved at [file]_parsed

python src/parse.py -train_from [gpus_0valid_best.pt] -text [type sentence here]

Pretrained models

Keeping the files under data/ folder unchanged, download model Should allow one to run parsing.

Notes

This "python src/preprocessing.py" starts with sentence original AMR files, while the paper version is trained on tokenized version provided by AMR-to-English aligner So the results could be slightly different. Also, to build a parser for out of domain data, please start preprocessing with "python src/preprocessing.py" to make everything consistent.

Contact

Contact chunchuan.lv@gmail.com if you have any questions!