A Theoretically Grounded Application of Dropout in RNN (2016) - Githubissues

esvhd / notes

Various study notes

Other

0 stars 0 forks source link

A Theoretically Grounded Application of Dropout in RNN (2016) #1

Open esvhd opened 6 years ago

esvhd commented 6 years ago

paper by Gal & Ghahramani, 2016.

Lua code available here

Introduces Variational LSTM: the same dropout mask is used for all time steps in the recurrent steps.

dropout

Formulation

untied-weights LSTM, formula (5)
tied-weights LSTM, formula (6), faster forward pass, slightly worse results.

For untied-weights, different dropout masks could be used for different gates.

lstm

Monte Carlo (MC) Dropout:

Obtained by performing dropout at test time 1000 times, and averaging the model outputs following equation (4) in paper.

predict

Experiments / Results

Larger model with dropout performed better than smaller model with dropout.
Untied-weights + MC dropout performed better than tied-weights + MC dropout.

results

esvhd commented 6 years ago

Found some nice notes here on this topic, and compares tensorflow and pytorch implementations.

I explains for the maths, also talks about the implementations in tensorflow/keras and pytorch.
II Compares with Merity AWD-LSTM-LM, which allows different mask for different gates for tied-weight LSTM. Detailed notes on keras LSTM.
III