Amazing-J / structural-transformer

Code corresponding to our paper "Modeling Graph Structure in Transformer for Better AMR-to-Text Generation" in EMNLP-IJCNLP-2019
75 stars 8 forks source link

Structural Transformer Model

This repository contains the code for our paper "Modeling Graph Structure in Transformer for Better AMR-to-Text Generation" in EMNLP-IJCNLP-2019

The code is developed under Pytorch 1.0 Due to the compitibility reason of Pytorch, it may not be loaded by some lower version (such as 0.4.0).

Please create issues if there are any questions! This can make things more tractable.

About AMR

AMR is a graph-based semantic formalism, which can unified representations for several sentences of the same meaning. Comparing with other structures, such as dependency and semantic roles, the AMR graphs have several key differences:

Data precrocessing

Baseline Input

Our baseline use the depth-first traversal strategy as in Konstas et al. to linearize AMR graphs to obtail simplified AMRs. We remove variables, wiki links and sense tags before linearization.

(a / and
            :op1 (b / begin-01
                  :ARG1 (i / it)
                  :ARG2 (t / thing :wiki "Massachusetts_health_care_reform"
                        :name (n / name :op1 "Romneycare")))
            :op2 (e / end-01
                 :ARG1 i
                 :ARG2 t))

need to be simplified as:

and :op1 ( begin :arg1 it :arg2 ( thing :name ( name :op1 romneycare ) ) ) :op2 ( end :arg1 it :arg2 thing )

Of course, after the transformation is complete, you still need to do Byte Pair Encoding (BPE) on it. As for the target end, we use the PTB_tokenizer from Stanford corenlp to preprocess our data. We also provide sample input for baseline (./corpus_sample/baseline_corpus).

Structural Transformer Input

Structure-Aware Self-Attention:

$$e_{ij} = \frac{\left(x_iW^Q\right)\left(xjW^K + r{ij}W^{R}\right)^{T}}{\sqrt{d_z}}$$

Note that the relation $$r_{ij}$$ is the vector representation for element pair ($$x_i$$, $$x_j$$).

We also use the depth-first traversal strategy to linearize AMR graphs to obtain simplified AMRs which only consist of concepts. As show below, the input sequence is much shorter than the input sequence in the baseline. -train_src For example: corpus_sample/.../train_concept_no_EOS_bpe

and begin it thing name romneycare end it thing

Besides, we also obtain a matrix which records the graph structure between every concept pairs, which implies their semantic relationship.

Learning Graph Structure Representation for Concept Pairs

The above structure-aware self-attention is capable of incorporating graph structure between concept pairs. We use a sequence of edge labels, along the path from $$x_i$$ to $$x_j$$ to indicate the AMR graph structure between concepts $$x_i$$ and $$x_j$$. In order to distinguish the edge direction, we add a direction symbol to each label with $$\uparrow$$ for climbing up along the path, and $$\downarrow$$ for going down. Specifically, for the special case of $$i==j$$, we use None as the path.

Feature-based Method

A natural way to represent the structural path is to view it as a string feature. To this end, we combine the labels in the structural path into a string. The model used is ./opennmt-feature. The parameter -train_structure represents the structural relationship in the AMR graph. We give the corresponding corpus sample corpus_sample/all_path_corpus. Each line in the corpus represents the structural relationship between all nodes in an AMR graph. Assuming $$n$$ concept nodes are input( -train_src ), there will be $${(n+1)}^2$$ tokens in this line, each token representing a path relationship ( There's also an EOS token at the end of the input sequence, so it is $${(n+1)}^2$$.

Avg\Sum\CNN\SA-based Method

To overcome the data sparsity in the above feature-based method, we view the structural path as a label sequence. We give the corresponding corpus sample corpus_sample/five_path_corpus . We split the -train_structure file in the above feature-based method into several corpus, which are -train_edge_all_bpe_1, -train_edge_all_bpe_2, and so on. For example, -train_edge_all_bpe_1 only contains the first token of each structure path, -train_edge_all_bpe_2 only contains the second token of each structure path, and so on. (In our experiment, it is optimal to set the length to 4, which means that we only use the first four corpus.)

After the corresponding corpus is prepared, modify the PATH within "". You should pay attention to the field "data_dir", which a directory of pre-processed data that will be used during training. We usually use the experiment setting, such as "./workspace/data". Finally, execute the corresponding script file, such as bash Data preprocessing is completed.


First, modify the PATH within "". "data_prefix" is the preprocessing directory we mentioned above. Note the prefix gq. For example "./workspace/data/gq". Finally, execute the corresponding script file, such as bash


All you need to do is change the PATH in the "" accordingly, and then execute bash


If you like our paper, please cite

  title={Modeling Graph Structure in Transformer for Better AMR-to-Text Generation},
  author={Jie Zhu, Junhui Li, Muhua Zhu, Longhua Qian, Min Zhang and Guodong Zhou},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP-2019)},