Open esvhd opened 6 years ago
paper by Gal & Ghahramani, 2016.
Lua code available here
Introduces Variational LSTM: the same dropout mask is used for all time steps in the recurrent steps.
For untied-weights, different dropout masks could be used for different gates.
Obtained by performing dropout at test time 1000 times, and averaging the model outputs following equation (4) in paper.
Found some nice notes here on this topic, and compares tensorflow and pytorch implementations.
tensorflow
pytorch
I explains for the maths, also talks about the implementations in tensorflow/keras and pytorch.
keras
II Compares with Merity AWD-LSTM-LM, which allows different mask for different gates for tied-weight LSTM. Detailed notes on keras LSTM.
III
paper by Gal & Ghahramani, 2016.
Lua code available here
Introduces Variational LSTM: the same dropout mask is used for all time steps in the recurrent steps.
Formulation
For untied-weights, different dropout masks could be used for different gates.
Monte Carlo (MC) Dropout:
Obtained by performing dropout at test time 1000 times, and averaging the model outputs following equation (4) in paper.
Experiments / Results