Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
941 stars 313 forks source link

Problems in Sentence Matching Task #320

Open zhzou2020 opened 8 years ago

zhzou2020 commented 8 years ago

I want to use LSTM as the encoder of sentences and then calculate their similarity based on it. But when I train the model, it seems that the parameters of it dont change at all. I've also tried other models before, and it turns out that they work properly. So I wonder if there's something wrong in my implementation of this model.

My model is as follows,

function ModelBuilder:make_net(w2v)
  require 'rnn'

  if opt.cudnn == 1 then
    require 'cudnn'
    require 'cunn'
  end

  local lookup = nn.LookupTable(opt.vocab_size, opt.vec_size) -- batch_size * seq_len
  lookup.weight:uniform(-0.25, 0.25)
  lookup.weight[1]:zero()

  rnn = nn.Sequential()
  rnn:add(lookup)

  input_size = opt.vec_size
  lstm_hidden_sizes = loadstring(" return " .. opt.lstm_hidden_sizes)()
  for i, lstm_hidden_size in ipairs(lstm_hidden_sizes) do
    local r = nn.SeqLSTM(input_size, lstm_hidden_size)
    r.maskzero = true
    r.batchfirst = true
    rnn:add(r)
    input_size = lstm_hidden_size
  end

  rnn:add(nn.Select(2, -1)) -- batch_size * lstm_hidden_size

  siamese_encoder = nn.ParallelTable()
  siamese_encoder:add(rnn)
  siamese_encoder:add(rnn:clone('weight', 'bias', 'gradWeight', 'gradBias'))

  model = nn.Sequential()
  model:add(siamese_encoder)
  model:add(nn.JoinTable(1, 1))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(lstm_hidden_sizes[#lstm_hidden_sizes] * 2, opt.hidden_size))
  model:add(nn.Dropout(opt.dropout_p))
  model:add(nn.Linear(opt.hidden_size, 2))
  model:add(nn.LogSoftMax())

  if opt.cudnn == 1 then
    cudnn.convert(model, cudnn)
  end

  return model
end
zhzou2020 commented 8 years ago

@nicholas-leonard Could you please help me find out the bugs in my code? Thanks!

JoostvDoorn commented 8 years ago

From the information you provide nothing seems to be wrong per se, though the example is not complete. So if you do need help first isolate your problem in the least amount of code as possible with (fake) data, and provide us with a working example that does not work for you. Or adapt one of the examples to use your data, and see if you still experience your issue.

zhuang-li commented 8 years ago

Hi. Have you solved this problem? I am trying to do the sentence similarity too and getting the same issue, except that here you use the "clone" to create the second encoder I just use the same lstm. The results are absurdly bad. Basically its accuracy never get promoted and its F-measure can only achieve 0.20~0.25.

zhzou2020 commented 8 years ago

My model converges, but it still cannot get a good performance. I assume that this model overfits and I am training this model with a larger dataset instead now.

zhuang-li commented 8 years ago

Yes I got the exactly same issue! The model converges but the result is pretty bad. But the model ,in fact, is a common baseline, I don't believe it will get such a bad performance. I don't know how large your dataset is, I am using the dataset "http://alt.qcri.org/semeval2015/task1/" which contains 13000 training instances. But no matter I use 100, 8000 or 13000 the performance is still the similar bad.

zhzou2020 commented 8 years ago

Maybe there's something wrong with the implementation of SeqLSTM, I'll try it with theano later on.

zhuang-li commented 8 years ago

Probably, but I implemented the LSTM myself before. Got the same problem. Then I switched to this module and haven't got any improvement. I am currently very confused. Maybe the problem is the model itself or just the way I code it.

JoostvDoorn commented 8 years ago

It is probably something other than the SeqLSTM implementation, but you could try the cudnn implementation if you are unsure. It would be very helpful if you could give us something that we can run though. @deathlee you should definitely clone otherwise the gradients are not stored.

JoostvDoorn commented 8 years ago

Are you following Mueller et al.? You should probably use CSubTable instead of JoinTable.

zhuang-li commented 8 years ago

Hi. I saw the Mueller et al. use the "tied weight" lstm. So basically I just run the same lstm back and forth twice for the left and right sequence. If we clone here, would they not share weights but two separate lstm? And I also tried to use these two functions(copying from the encoder-decoder example)

function LSTMSim:forwardConnect(llstm, rlstm)
    rlstm.layer.userPrevOutput = llstm.layer.output[self.seq_length]
    rlstm.layer.userPrevCell = llstm.layer.cell[self.seq_length]
end

function LSTMSim:backwardConnect(llstm, rlstm)
    llstm.layer.gradPrevOutput = rlstm.layer.userGradPrevOutput
    llstm.layer.userNextGradCell = rlstm.layer.userGradPrevCell
end

to perserve the state and gradient, which, in here, I believe it just passes the previous state and gradient in the same lstm from head to tail because I use the same lstm for two sequences.

I also tried to create two separate lstms. Didn't work either.

I'd really like to offer something to run if you are not bothered. I am currently doing some comments. Thank you.