Closed hadyelsahar closed 2 years ago
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
Hey all.
There's this problem that when my target vocabulary is smaller than the beam size. The beam size gets truncated.
Why that can be a problem? Many sequence labeling tasks move from non-autoregressive decoders to autoregressive decoders. For example, a POS-tagger would make benefit from previous decisions. Or in case of extractive sentence compression as in case of [Fillipova et al. 2015] (https://aclweb.org/anthology/D/D15/D15-1042.pdf) for those applications you cannot experiment using beam size larger than vocab size which can be very small (2) in case of sentence compression.
How / why does that happen? Given B is the beamsize and V is the vocabulary size If
B < V
During the early steps of decoding the selection for beam augmentation is smaller than the beam size. i.e.B > V^t
which means we don't have enough candidates to add to the beam. This forces you guys to truncate the beam size to the vocab size.This is happens in those two lines: https://github.com/pytorch/fairseq/blob/master/fairseq/sequence_generator.py#L127 https://github.com/pytorch/fairseq/blob/master/fairseq/search.py#L70
My Proposal to Fix that: In the early steps of decoding when
V^t < B
we augment our prediction probabilities by some-inf
scores. we won't need that lateron anyways when t is large enough to makeV^t > B
I have tried it locally and it seems to be working fine just some additional
sequence_generator.py
. Would like to hear your thoughts about that guys?