Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
941 stars 313 forks source link

Copy the whole thing #164

Closed lifengjin closed 8 years ago

lifengjin commented 8 years ago

Hi, Thanks for the great work. I wonder if there is a way to make copies of a time step of a LSTM. I am trying to implement beam search with a normal LSTM. This requires me to dynamically branching a LSTM sequence into several sequences. The branches will have to share everything with the original branch, but still be separate branches so that they can go forward independently. Is there a way currently to do this? Thanks.

nicholas-leonard commented 8 years ago

@lifengjin Branching as you described isn't currently supported by AbstractRecurrent instances. What would your use-case look like?

lifengjin commented 8 years ago

I only need this in evaluation. So no BPTT. Say I have local parent = nn.Sequencer(nn.Sequential():add(nn.LSTM(n, m)):add(nn.LogSoftMax()))

The input is actually a concatenation of current word and previous output. But instead of previous output, we have previous outputs. So for time step k, we take sparse representations of top n classes and concatenate with the w_k. This is to implement some kind of beam search for parsing or word segmentation. So at this time step, we want to calculate p(C_k|w_1...w_k-1, w_k, C_1...C_k-1), and we may have C_k-1 = c_1 ... c_n. I think the way to implement this is at this time step, do for each c_n: local child = parent:sharedClone() for i = 1, #parent.cells do child.cells[i] = parent.cells[i]:clone() child.outputs[i] = parent.outputs[i]:clone() end Does this produce a new branch of the parent LSTM?

nicholas-leonard commented 8 years ago

Yeah, but you should be ok with just :

 local child = parent:sharedClone()

It will clone everything but continue to share the parameters between child and parent.

Since you only intend to use this for evaluation, sharedCloning it for each branch should work fine. It won't be the most efficient way though as your will be cloning a lot of useless stuff in the child. But it will work.

lifengjin commented 8 years ago

What if you manually record all the cells and outputs? Will that make a difference? Say:

parent:evaluate()
local proto = parent:sharedClone()
local prediction
for i in 1, #inputs do
    prediction = parent:forward(inputs)  -- just doing forward for a bunch of times
end
for i in 1, #topk(prediction) do
    local child = proto:sharedClone()
    child.cells = parent.cells
...
nicholas-leonard commented 8 years ago

@lifengjin Yeah that would be faster. However, it is more complicated then that. You should reference the parent cells into a new table otherwise both parent and child point to the same table (conflict). But don't forget to reference child.outputs, and child.sharedClones.

lifengjin commented 8 years ago

Yes. Thanks. Closing this.