Open maxpumperla opened 6 years ago
Right, we don't support dropout on recurrent activations. One thing to consider here (need to check) is that some (not sure if this includes Keras?) implementations use the same dropout mask for all time steps (vs. doing drop-out independently at each time step). Consequently it won't be a simple drop-in change to add it. Another questios in whether we want to support both options?
Hi, I was tuning params and I noticed that dropout does not have any effect when using GravesLSTM layer. Code example: https://gist.github.com/anonymous/2cb8fba1e6bdf29b0bc655ae6c7b68fd I consistently got the same results, eg: Score for dropout: '0.8' at iteration 1 is 491.271145951138 Score for dropout: '0.5' at iteration 1 is 491.271145951138 Score for dropout: '0.2' at iteration 1 is 491.271145951138 Score for dropout: '0.8' at iteration 2 is 396.11172254765324 Score for dropout: '0.5' at iteration 2 is 396.11172254765324 Score for dropout: '0.2' at iteration 2 is 396.11172254765324 Score for dropout: '0.2' at iteration 3 is 371.7333969051259 Score for dropout: '0.5' at iteration 3 is 371.7333969051259 Score for dropout: '0.8' at iteration 3 is 371.7333969051259 Score for dropout: '0.5' at iteration 4 is 369.64327864362696 Score for dropout: '0.2' at iteration 4 is 369.64327864362696 Score for dropout: '0.8' at iteration 4 is 369.64327864362696
Is it possible to somehow approximate dropout with DropoutLayer? One mask per layer per batch would be enough.
Otherwise, great job!
Br, Zlatko
@zvreifnitz hm, it should be applied on the input (but not the recurrent) activations... I'll open a separate issue. In the mean time, a dropout layer before the LSTM layer should work as expected.
@zvreifnitz I've confirmed and fixed an issue with dropout not being applied to recurrent layers here: https://github.com/deeplearning4j/deeplearning4j/pull/4823 Thanks for flagging this (though next time - feel free to open a separate issue - recurrent dropout is a different issue to what you reported)
@AlexDBlack I sincerely apologise, I misunderstood the content, so I read "recurrent" as network type and not as "unwinding" the cell.
@zvreifnitz sure, no worries - thanks :)
Small issue related to Keras import. Keras allows recurrent units to have dropout on their own, with a dropout rate that can be different from the regular one.