Open flrngel opened 6 years ago
https://arxiv.org/abs/1711.02281
Features
How
Paper model uses CNN and SAN (Transformer) to avoid autoregressive
Problems of beam-search
They made output length variable T as probabilistic variable
Multimodality problem is problem of "high multimodal distribution of target translation"
Used IBM Model 2 to use fertilities.
Definition of fertilities and it's benefit
I didn't like this section
Uses KL Divergence, RL, backpropagation
Word-level knowledge distillation (Teacher)
External fertility inference model
https://arxiv.org/abs/1711.02281
Abstract
Features
How
1. Introduction
Paper model uses CNN and SAN (Transformer) to avoid autoregressive
2. Background
2.1. Autoregressive Neural Machine Translation
2.2. Non-Autoregressive decoding
Problems of beam-search
They made output length variable T as probabilistic variable
2.3. The multimodality problem
Multimodality problem is problem of "high multimodal distribution of target translation"
3. The non-autoregressive transformer
3.3. Modeling fertility to tackle the multimodality problem
Used IBM Model 2 to use fertilities.
Definition of fertilities and it's benefit
3.4. Translation predictor and the decoding process
4. Training
I didn't like this section4.2. Fine-Tuning
Uses KL Divergence, RL, backpropagation
Word-level knowledge distillation (Teacher)
External fertility inference model
Todo