LSTMs cannot use different batch sizes in the same computation graph

mitchellstern commented 7 years ago

Now that DyNet has switched to variational dropout, a dropout mask needs to be stored so that the same mask can be applied across all time steps. Unfortunately this means that an LSTM cannot be used with different batch sizes in the same computation graph, since the initial mask will have the wrong shape for later batches.

Here's a minimal example demonstrating the problem:

import dynet as dy
model = dy.Model()
lstm = dy.BiRNNBuilder(1, 5, 20, model, dy.VanillaLSTMBuilder)
lstm.set_dropout(0.5)
lstm.transduce([dy.zeros(5, batch_size=2)])
# Now the dropout mask has batch size 2, so it cannot be applied to other batch sizes
lstm.transduce([dy.zeros(5, batch_size=3)])

This results in the error:

ValueError: CwiseMultiply: batch size must match or equal 1

Although the above code will run if both batch sizes are the same, we may not want to apply the same dropout mask to different examples. Additionally, if the first batch size is 2 but the second batch size is 1, broadcasting will result in an output with batch size 2 for the latter result, which is incorrect behavior.

The use case here is that I would like to run an LSTM over every fixed-size window within a sentence, then do this for multiple sentences within a batch. The batch size in each call is the length of the sentence.

Is there any way around this? Perhaps a new mask could be generated for each new initial state within a given computation graph?

pmichel31415 commented 7 years ago

Would this solve your problems set_dropout_masks?

Works for VanillaLSTMBuilder, not sure about BiLSTM

mitchellstern commented 7 years ago

I was getting segfaults earlier when I was trying to use that method, but I'm having trouble reproducing the issue now. Perhaps I wasn't properly heeding this warning in the documentation: "You need to call this AFTER calling initial_state". Closing for now, thanks!

neubig commented 7 years ago

I think this is a workaround, but doesn't solve the original problem above. We should fix this.

chunyang-wen commented 7 years ago

It seems that there is a workaround. I want to work on the solution.

miguelballesteros commented 7 years ago

is this related to this error I'm getting now?

https://github.com/clab/dynet/issues/1069

clab / dynet

LSTMs cannot use different batch sizes in the same computation graph #1053