Support recurrent dropout in LSTMs

maxpumperla commented 6 years ago

Small issue related to Keras import. Keras allows recurrent units to have dropout on their own, with a dropout rate that can be different from the regular one.

AlexDBlack commented 6 years ago

Right, we don't support dropout on recurrent activations. One thing to consider here (need to check) is that some (not sure if this includes Keras?) implementations use the same dropout mask for all time steps (vs. doing drop-out independently at each time step). Consequently it won't be a simple drop-in change to add it. Another questios in whether we want to support both options?

zvreifnitz commented 6 years ago

Hi, I was tuning params and I noticed that dropout does not have any effect when using GravesLSTM layer. Code example: https://gist.github.com/anonymous/2cb8fba1e6bdf29b0bc655ae6c7b68fd I consistently got the same results, eg: Score for dropout: '0.8' at iteration 1 is 491.271145951138 Score for dropout: '0.5' at iteration 1 is 491.271145951138 Score for dropout: '0.2' at iteration 1 is 491.271145951138 Score for dropout: '0.8' at iteration 2 is 396.11172254765324 Score for dropout: '0.5' at iteration 2 is 396.11172254765324 Score for dropout: '0.2' at iteration 2 is 396.11172254765324 Score for dropout: '0.2' at iteration 3 is 371.7333969051259 Score for dropout: '0.5' at iteration 3 is 371.7333969051259 Score for dropout: '0.8' at iteration 3 is 371.7333969051259 Score for dropout: '0.5' at iteration 4 is 369.64327864362696 Score for dropout: '0.2' at iteration 4 is 369.64327864362696 Score for dropout: '0.8' at iteration 4 is 369.64327864362696

Is it possible to somehow approximate dropout with DropoutLayer? One mask per layer per batch would be enough.

Otherwise, great job!

Br, Zlatko

AlexDBlack commented 6 years ago

@zvreifnitz hm, it should be applied on the input (but not the recurrent) activations... I'll open a separate issue. In the mean time, a dropout layer before the LSTM layer should work as expected.

AlexDBlack commented 6 years ago

@zvreifnitz I've confirmed and fixed an issue with dropout not being applied to recurrent layers here: https://github.com/deeplearning4j/deeplearning4j/pull/4823 Thanks for flagging this (though next time - feel free to open a separate issue - recurrent dropout is a different issue to what you reported)

zvreifnitz commented 6 years ago

@AlexDBlack I sincerely apologise, I misunderstood the content, so I read "recurrent" as network type and not as "unwinding" the cell.

AlexDBlack commented 6 years ago

@zvreifnitz sure, no worries - thanks :)

deeplearning4j / deeplearning4j

Support recurrent dropout in LSTMs #4614