abisee / pointer-generator

Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
Other
2.18k stars 811 forks source link

Question about generated summary #53

Open Opdoop opened 7 years ago

Opdoop commented 7 years ago

Thanks sharing the awesome implementation.

In batcher.py the padding function seems to pad decoder input to max_len. Is your model able to generate variable lenghth summary eventually?

def pad_decoder_inp_targ(self, max_len, pad_id): """Pad decoder input and target sequences with pad_id up to max_len.""" while len(self.dec_input) < max_len: self.dec_input.append(pad_id) while len(self.target) < max_len: self.target.append(pad_id)

abisee commented 7 years ago

That padding code is for the decoder inputs and targets during training, not test-time decoding. During decoding, the decoder is run one step at a time with beam search, and decoding stops when the STOP token is generated. So yes, it can generate variable length summaries at test time.

Opdoop commented 7 years ago

@abisee Thanks! When train model set tf.app.flags.DEFINE_boolean('pointer_gen', True, 'If True, use pointer-generator model. If False, use baseline model.') If it possible that decoding (beam search) generated summary have [UNK] token ?

abisee commented 7 years ago

Yes, the pointer-generator model is able to produce UNK tokens during decoding. UNK is part of the vocabulary object and the pointer-generator decoder has access to the whole vocabulary.

Opdoop commented 7 years ago

@abisee Thanks so much ! In the paper, from both Supplementary Material and Figure 1, it seems pointer-generator model solved OOV problem.

  1. Have you ever observed [UNK] token when using pointer-generator model ?

  2. As Final Distribution is calculated by equation (9) and Note that if w is an out-of-vocabulary (OOV) word, then Pvocab(w) is zero; similarly if w does not appear in the source document, then ∑i:wi=w ati is zero. On what condition will pointer-generate model generate UNK ?

abisee commented 7 years ago
  1. We very rarely / perhaps never see UNKs in the output of the pointer-generator model. However, this is mostly because at test time, the pointer-generator model acts in pointing mode almost all of the time. This is a problem; we'd like it to write more abstractively. Figuring that out is a future research direction. Perhaps if it wrote more abstractively it would sometimes generate UNKs.

  2. That line in the paper is just saying (1) If a word is out-of-vocabulary, e.g. "flamingo", then P_vocab(flamingo)=0, as you'd expect. So the only way you can generate "flamingo" is if it appears in the source document. (2) If a word (which may be in-vocabulary) e.g. "how" doesn't appear in the source document, then the probability of pointing to "how" is zero (that's the attention sum). Given that in the code we treat UNK as just another word in the vocabulary, and we have no UNKs in the source document, the pointer-generator model can generate an UNK if P_vocab(UNK) and the generation probability p_gen are both sufficiently large.