Closed willfrey closed 8 years ago
Is it as simple as using narrowing down to the modules I want to share through a series of fwd:get()
and bwd:get()
calls, then using bwd:get(...):share(fwd:get(...), 'weight', 'bias', 'gradWeight', 'gradBias')
?
That appears to be working. My biggest concern is that I'll break something to do with the nn.AbstractRecurrent
definition.
@willfrey Your first above example should work. What you mention in second comment should also work.
Thank you!
In Deep Speech 2, they share the input-hidden weights for both directions of a bidirectional RNN. It's mentioned in the middle of page 5.
What is the proper way to do this for various architectures such as
nn.LSTM
,nn.FastLSTM
,nn.GRU
, or anynn.Recurrence
?For any
nn.Recurrence
instance, I think that I can do this:But there is probably a more elegant way to do this, perhaps by only initializing a
fwd
RNN and usingbwd = fwd:clone(); bwd:reset();
and sharing the weights somehow then.For
nn.LSTM
,nn.FastLSTM
andnn.GRU
, I don't have the faintest idea.I'm still very much a Torch novice, so any help is appreciated!
Thanks.