Padding bidirectional LSTM/GRU/RNN encoder

miguelballesteros commented 6 years ago

Hi everyone,

let's suppose I have a bidirectional LSTM encoder which is manually batched, let's also assume that my input sentences may have different lengths and thus I need to add PAD vectors for some of them. My FW LSTM will have some PAD vectors at the end, and my BW LSTM will have some PAD vectors at the beginning (which is not nice). I can of course mask it to run attention over it, etc, but if I want to use the summary of the LSTMs, they will include some useless information that may decrease my accuracy and produce some noise.

In pytorch, there is this thing called pack_padded_sequence and pad_packed_sequence which exploit cuda to make it fast and to do exactly what I want to do if I'm not mistaken,(http://pytorch.org/docs/0.3.0/nn.html?highlight=pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence and http://pytorch.org/docs/0.3.0/nn.html?highlight=pad_packed_sequence#torch.nn.utils.rnn.pad_packed_sequence)

I wonder, if we have the same thing in Dynet.

and if not, what is the best strategy to work this around?

Thanks a lot! Miguel

neubig commented 6 years ago

We do not have this, but i've been thinking it'd be a nice feature to have. In the mean time, you could do something like we've implemented in xnmt: https://github.com/neulab/xnmt/blob/master/xnmt/lstm.py#L93

miguelballesteros commented 6 years ago

Thanks @neubig !

clab / dynet

Padding bidirectional LSTM/GRU/RNN encoder #1121