OOV during decoding - Githubissues

abisee / pointer-generator

Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"

Other

2.18k stars 813 forks source link

OOV during decoding #96

Open alireza202 opened 6 years ago

alireza202 commented 6 years ago

Do you manually set the P_vocab(OOV) = 0 in your code somewhere? I can't seem to find such a thing. In your paper you said:

Note that if w is an out-of-vocabulary (OOV) word, then P_vocab(w) is zero

How would the P_vocab(OOV) be zero? If you don't set it manually to zero, it would not. What if OOV is selected (in the extended vocab) during decoding? Do you replace it during postprocessing?

peterjliu commented 6 years ago

P_vocab is a distribution over the vocabulary words. So everything outside of the vocab has no mass.

alireza202 commented 6 years ago

But OOV is added to the vocabulary, right?

bhomass commented 5 years ago

The vocab is fixed size throughout. I am looking into this issue in depth. The way OOV is predicted during decoding is really only meaning for during training, where the target sentence guides the prediction. During testing, because the oov words have no vector representation, and don't participate in the attention driven context, the model would have to use other available information. I suspect the model is leveraging the order of the oov words and the context information from their non-oov neighboring words.

rookiebird commented 4 years ago

In model.py num 163-164

extra_zeros = tf.zeros((self._hps.batch_size, self._max_art_oovs)) vocab_dists_extended = [tf.concat(axis=1, values=[dist, extra_zeros]) for dist in vocab_dists] # list length max_dec_steps of shape (batch_size, extended_vsize)

It padding the original vocab_dists with zeros tensor which is the P_vocab(OOV)

rookiebird commented 4 years ago

I find that in the encoding step ,the input of oov is represented as unk_word. According to the code , if an oov is copied by model, that's to say the unk_word embedding contribute a lot to that decoding step? I agree with bhomass that the model is leveraging the context of the oov.