Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
938 stars 314 forks source link

Question regarding encoder-decoder coupler #110

Open cjmcmurtrie opened 8 years ago

cjmcmurtrie commented 8 years ago

Hi there, I have a question regarding the encoder-decoder coupler suggested in the examples here. My question is specifically regarding the decoder.

At the first time step, the decoder accepts the final embedding outputted by the encoder. After this first time step, the input for the next step is the decoder's output from the previous time step.

The way I have implemented a similar model in the past in Python, worked such that the final input from the encoder is input repeatedly to the decoder at each time step. My understanding is this is very useful to make sure the decoder is consistently addressing the encoded vector, rather than greedily generating sequences according to it's own output.

Have I understood the implementation correctly? If not, where is my understanding wrong? If so, what would be the best way to pass the encoded vector repeatedly to the decoder, rather than passing it's own outputs back into the next time step?

For example, would one solution be to decorate the decoder with nn.Repeater(), rather than nn.Sequencer()? If this is the case, do you know if there is an example implementation of this using the library?

Thanks!

nicholas-leonard commented 8 years ago

@cjmcmurtrie You could use ZipTableOneToMany to build a new input input sequence as you described. You stack a nn.Sequencer(nn.JoinTable(...)) to concatenate each time-step into a single table before feeding the sequence to your rnn.

cjmcmurtrie commented 8 years ago

@nicholas-leonard This would be useful if you wanted the output sequence to have a fixed-length, decided (probably) by the length of the input sequence. But if you are mapping sequences of different lengths, I think pre-constructing the sequence for the decoder is not the best way.

Have I misunderstood the purpose of nn.Repeater? The documentation states that this decorator is useful for convolutional RNN implementations. With convolutional RNN encoder-decoder models, you are usually feeding an image encoding repeatedly to a language model...

htw2012 commented 8 years ago

@nicholas-leonard Could you give me an example like @cjmcmurtrie describe? I aslo find this is a problem

nicholas-leonard commented 8 years ago

@cjmcmurtrie You can use Repeater if all you are doing is repeatedly forwarding the same input through an rnn for a fixed number of time-steps. Is this what you need for decoding?

@douwekiela @cjmcmurtrie @htw2012 @cheng6076 Could you provide a link to some articles that describe such decode-encoder implementations? Ideally, the paper uses a publicly available dataset of tractable size. The objective would be to provide a more complete end-end example that reproduces a paper.

cheng6076 commented 8 years ago

@cjmcmurtrie @htw2012 There are many ways to build an encoder-decoder architecture. To start with, I would recommend the paper 'Skip-thought vectors' http://arxiv.org/abs/1506.06726. Theano implementation https://github.com/ryankiros/skip-thoughts. The most standard way is to use separate encoder and decoder to respectively process the source and target sequence, where the last hidden state of the encoder is used to initialise the first hidden state of the decoder. During training when we have access to the target sequence, we can feed the true word as input to the decoder; while at test time we need to feed the previously predicted word. To mitigate this gap we can use scheduled sampling as in this paper 'Scheduled sampling for sequence prediction with recurrent neural networks' http://arxiv.org/abs/1506.03099. Of course, it also makes sense to feed the encoded representation as an additional input to every hidden state of the decoder, it is more like a design choice to me. Besides, you can use attention to derive a soft alignment between encoder and decoder, as in the paper 'Neural machine translation by jointly learning to align and translate' http://arxiv.org/abs/1409.0473.

htw2012 commented 8 years ago

@cheng6076 That's great.

Maybe in 'encoder-decoder coupler',we can do a test procedure whch regard the previous decoder ouputs as current decoder inputs in the example.We can change the train like the test in model.

When I implement it, I need change code like here:

    dec:zeroGradParameters()
    -- Forward pass
    local encOut = enc:forward(encInSeq)
    forwardConnect(encLSTM, decLSTM)
    local decOut = dec:forward(decInSeq)

decInSeq need to be replaced by previous decoder output, but I can't get the decoder output at each time. Is there any way I can do it? @cjmcmurtrie @nicholas-leonard @cheng6076 @douwekiela

nicholas-leonard commented 8 years ago

@htw2012 You can feed in one time-step at a time with remember('both'). Feed the predicted output as the next input. So basically, your decInSeq will be a table containing one element (the previous output).

htw2012 commented 8 years ago

I rewrite the code like following:

  local numRecords = encInSeq:size(1) --- this is batchSize
  local max_out = encInSeq:size(2) ---this is vocabSize
  local decOut = torch.Tensor(numRecords,max_out,lastEos)
  local criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())
  for num =1,numRecords do
    encLSTM:remember() --
    decLSTM:remember()
    local decOutSubTable = torch.Tensor(max_out,lastEos)
    local curLineEncInSeq = encInSeq[num]
    local curLineEncOut = enc:forward(curLineEncInSeq)
    forwardConnect(encLSTM, decLSTM)
    local output = torch.Tensor{lastEos}--feed the start sign to decoder first,and reuse it.
    for i = 1, max_out do
      local curDecOutputSeq = dec:forward(output)
      local predicSeq = curDecOutputSeq[1]
      decOutSubTable[i] = predicSeq
      local prob, preds = predicSeq:sort(1, true)
      output = torch.Tensor{preds[1]}---this decoder output as next time input
      if output[1] == lastEos then
        break
      end
    end
    decOut[num] = decOutSubTable
    --    enc:forget() 
    --    dec:forget()
    decOut = nn.SplitTable(2,1):forward(decOut) --
    local err = criterion:forward(decOut, decOutSeq_testNew)--decOutSeq_testNew is  equal to  nn.SplitTable(2,1):forward(decOutSeq) 
  end

I found that my implement is a little complicated.At the same time, I have some question want to ask you: 1.I don't know how to use the 'remember()' or forget() at proper time. 2.How to use you mention remember('both') to implement it? I am very looking forward to your solutions. 3.I don't know where is my error in procedure that it isn't convergent in encoder-to-decoder testing stage .Could you give me some suggests?Thank you in advance.

@nicholas-leonard @cjmcmurtrie @cheng6076

nicholas-leonard commented 8 years ago
  1. use forget between independent sequences. It basically zeros the previous hidden state h[t-1]. Use remember to tell the Sequencer to not call forget before each call to forward.
  2. You seem to be using it the right way in your above example.
  3. What kind of error do you get?
htw2012 commented 8 years ago

I just found my model's loss decrease in training stage,but decrease in test stage.It is not overfit because the test loss increased at early time.So I suspect that my test procedure has some errors in it. @nicholas-leonard

nicholas-leonard commented 8 years ago

@htw2012 Make sure you applying the same kind of logic in training than in testing.

htw2012 commented 8 years ago

@nicholas-leonard Using encoder-decoder model,the train stage is different with test stage.Because the train step can use stardard input as decoder input,like the code in (local encOut = enc:forward(encInSeq),but the test step we can't get the stardard 'encInSeq' as decoder's input,we just can use the previous decoder's ouput as current decoder's input.

So, that's different in encoder-decoder model.Could you have some suggestions about this model?

htw2012 commented 8 years ago

My code is a little trivial to rewrite in test stage.Owing to we use 'Sequencer' to forward a input sequencer,but when we need present one element at a time like 'AbstractRecurrent' does, it is not convenient like AbstractRecurrent'.So any good solution to solve it? @nicholas-leonard

nicholas-leonard commented 8 years ago

You can use Sequencer to forward one time-step at a time :

rnn = nn.Sequencer(rnn)
rnn:remember('both') -- so it doesn't call forget before each forward
for t=1,seqlen do
    local input = {inputs[t]} -- sequence of one time-step
    rnn:forward(input)
end
htw2012 commented 8 years ago

Yes, I can do it . Thanks. @nicholas-leonard rnn:remember('both') is that the Sequencer will not call forget at the start of each call to forward. I don't know whether it will affect next sequence? During my training I did not actively call remember method in forward ,and the backward I also do not call forget method.Then I do the test, at first of sequence I call remember method,at last of sequence I call forget method. Is it ok?

nicholas-leonard commented 8 years ago

With remember('both'), call forget() between independent sequences. So call remember() once. But call forget() before each new sequence.

leesunfreshing commented 7 years ago

Hi, as a beginner with torch and LSTM, I am wondering how the input and 'remember()' works (Apologize if the raised questions are simple ones :D). For example, suppose input = torch.Tensor({{1,2,3,4,7,0,0,0},{5,4,3,2,1,7,0,0}}) LSTM=nn.Sequencer(nn.LSTM) LSTM:remember() LSTM:foward(input) Does the code above train two different LSTM networks due to two independent sequence are fed? If without calling 'remember()', or calling 'remember(both)' means only one LSTM is trained, with inputs of the second sequence start to train with hidden layer from first sequence? Any of your help would be highly appreciated!! @nicholas-leonard @htw2012 @cheng6076 @sisirkoppaka

wangliangguo commented 7 years ago

@cjmcmurtrie Have the question solved? I also have the same problem with the attention version.