Open mitchellstern opened 7 years ago
Would this solve your problems set_dropout_masks
?
Works for VanillaLSTMBuilder, not sure about BiLSTM
I was getting segfaults earlier when I was trying to use that method, but I'm having trouble reproducing the issue now. Perhaps I wasn't properly heeding this warning in the documentation: "You need to call this AFTER calling initial_state
". Closing for now, thanks!
I think this is a workaround, but doesn't solve the original problem above. We should fix this.
It seems that there is a workaround. I want to work on the solution.
is this related to this error I'm getting now?
Now that DyNet has switched to variational dropout, a dropout mask needs to be stored so that the same mask can be applied across all time steps. Unfortunately this means that an LSTM cannot be used with different batch sizes in the same computation graph, since the initial mask will have the wrong shape for later batches.
Here's a minimal example demonstrating the problem:
This results in the error:
Although the above code will run if both batch sizes are the same, we may not want to apply the same dropout mask to different examples. Additionally, if the first batch size is 2 but the second batch size is 1, broadcasting will result in an output with batch size 2 for the latter result, which is incorrect behavior.
The use case here is that I would like to run an LSTM over every fixed-size window within a sentence, then do this for multiple sentences within a batch. The batch size in each call is the length of the sentence.
Is there any way around this? Perhaps a new mask could be generated for each new initial state within a given computation graph?