Closed lifengjin closed 8 years ago
@lifengjin Branching as you described isn't currently supported by AbstractRecurrent instances. What would your use-case look like?
I only need this in evaluation. So no BPTT.
Say I have
local parent = nn.Sequencer(nn.Sequential():add(nn.LSTM(n, m)):add(nn.LogSoftMax()))
The input is actually a concatenation of current word and previous output. But instead of previous output, we have previous outputs. So for time step k, we take sparse representations of top n classes and concatenate with the w_k. This is to implement some kind of beam search for parsing or word segmentation. So at this time step, we want to calculate p(C_k|w_1...w_k-1, w_k, C_1...C_k-1), and we may have C_k-1 = c_1 ... c_n. I think the way to implement this is at this time step, do for each c_n:
local child = parent:sharedClone()
for i = 1, #parent.cells do
child.cells[i] = parent.cells[i]:clone()
child.outputs[i] = parent.outputs[i]:clone()
end
Does this produce a new branch of the parent LSTM?
Yeah, but you should be ok with just :
local child = parent:sharedClone()
It will clone everything but continue to share the parameters between child and parent.
Since you only intend to use this for evaluation, sharedCloning it for each branch should work fine. It won't be the most efficient way though as your will be cloning a lot of useless stuff in the child. But it will work.
What if you manually record all the cells and outputs? Will that make a difference? Say:
parent:evaluate()
local proto = parent:sharedClone()
local prediction
for i in 1, #inputs do
prediction = parent:forward(inputs) -- just doing forward for a bunch of times
end
for i in 1, #topk(prediction) do
local child = proto:sharedClone()
child.cells = parent.cells
...
@lifengjin Yeah that would be faster. However, it is more complicated then that. You should reference the parent cells into a new table otherwise both parent and child point to the same table (conflict). But don't forget to reference child.outputs
, and child.sharedClones
.
Yes. Thanks. Closing this.
Hi, Thanks for the great work. I wonder if there is a way to make copies of a time step of a LSTM. I am trying to implement beam search with a normal LSTM. This requires me to dynamically branching a LSTM sequence into several sequences. The branches will have to share everything with the original branch, but still be separate branches so that they can go forward independently. Is there a way currently to do this? Thanks.