Open Opdoop opened 7 years ago
That padding code is for the decoder inputs and targets during training, not test-time decoding. During decoding, the decoder is run one step at a time with beam search, and decoding stops when the STOP token is generated. So yes, it can generate variable length summaries at test time.
@abisee Thanks!
When train model set tf.app.flags.DEFINE_boolean('pointer_gen', True, 'If True, use pointer-generator model. If False, use baseline model.')
If it possible that decoding (beam search) generated summary have [UNK]
token ?
Yes, the pointer-generator model is able to produce UNK tokens during decoding. UNK is part of the vocabulary object and the pointer-generator decoder has access to the whole vocabulary.
@abisee Thanks so much ! In the paper, from both Supplementary Material and Figure 1, it seems pointer-generator model solved OOV problem.
Have you ever observed [UNK]
token when using pointer-generator model ?
As Final Distribution is calculated by equation (9)
and Note that if w is an out-of-vocabulary (OOV) word, then Pvocab(w) is zero; similarly if w does not appear in the source document, then ∑i:wi=w ati is zero.
On what condition will pointer-generate model generate UNK ?
We very rarely / perhaps never see UNKs in the output of the pointer-generator model. However, this is mostly because at test time, the pointer-generator model acts in pointing mode almost all of the time. This is a problem; we'd like it to write more abstractively. Figuring that out is a future research direction. Perhaps if it wrote more abstractively it would sometimes generate UNKs.
That line in the paper is just saying (1) If a word is out-of-vocabulary, e.g. "flamingo", then P_vocab(flamingo)=0, as you'd expect. So the only way you can generate "flamingo" is if it appears in the source document. (2) If a word (which may be in-vocabulary) e.g. "how" doesn't appear in the source document, then the probability of pointing to "how" is zero (that's the attention sum). Given that in the code we treat UNK as just another word in the vocabulary, and we have no UNKs in the source document, the pointer-generator model can generate an UNK if P_vocab(UNK) and the generation probability p_gen are both sufficiently large.
Thanks sharing the awesome implementation.
In batcher.py the padding function seems to pad decoder input to max_len. Is your model able to generate variable lenghth summary eventually?