Closed eriche2016 closed 8 years ago
I kind of know the difference, is it try to avoid the 'double' biases term. for example, when summation of two linear layer outputs will cause two biases term here, this kinda violate the equation of RNN or LSTM. am i correct?
@eriche2016 Yes, exactly.
can you give some explainations or some reference. thanks