Open zhzou2020 opened 8 years ago
@nicholas-leonard Could you please help me find out the bugs in my code? Thanks!
From the information you provide nothing seems to be wrong per se, though the example is not complete. So if you do need help first isolate your problem in the least amount of code as possible with (fake) data, and provide us with a working example that does not work for you. Or adapt one of the examples to use your data, and see if you still experience your issue.
Hi. Have you solved this problem? I am trying to do the sentence similarity too and getting the same issue, except that here you use the "clone" to create the second encoder I just use the same lstm. The results are absurdly bad. Basically its accuracy never get promoted and its F-measure can only achieve 0.20~0.25.
My model converges, but it still cannot get a good performance. I assume that this model overfits and I am training this model with a larger dataset instead now.
Yes I got the exactly same issue! The model converges but the result is pretty bad. But the model ,in fact, is a common baseline, I don't believe it will get such a bad performance. I don't know how large your dataset is, I am using the dataset "http://alt.qcri.org/semeval2015/task1/" which contains 13000 training instances. But no matter I use 100, 8000 or 13000 the performance is still the similar bad.
Maybe there's something wrong with the implementation of SeqLSTM, I'll try it with theano later on.
Probably, but I implemented the LSTM myself before. Got the same problem. Then I switched to this module and haven't got any improvement. I am currently very confused. Maybe the problem is the model itself or just the way I code it.
It is probably something other than the SeqLSTM implementation, but you could try the cudnn implementation if you are unsure. It would be very helpful if you could give us something that we can run though. @deathlee you should definitely clone otherwise the gradients are not stored.
Are you following Mueller et al.? You should probably use CSubTable instead of JoinTable.
Hi. I saw the Mueller et al. use the "tied weight" lstm. So basically I just run the same lstm back and forth twice for the left and right sequence. If we clone here, would they not share weights but two separate lstm? And I also tried to use these two functions(copying from the encoder-decoder example)
function LSTMSim:forwardConnect(llstm, rlstm)
rlstm.layer.userPrevOutput = llstm.layer.output[self.seq_length]
rlstm.layer.userPrevCell = llstm.layer.cell[self.seq_length]
end
function LSTMSim:backwardConnect(llstm, rlstm)
llstm.layer.gradPrevOutput = rlstm.layer.userGradPrevOutput
llstm.layer.userNextGradCell = rlstm.layer.userGradPrevCell
end
to perserve the state and gradient, which, in here, I believe it just passes the previous state and gradient in the same lstm from head to tail because I use the same lstm for two sequences.
I also tried to create two separate lstms. Didn't work either.
I'd really like to offer something to run if you are not bothered. I am currently doing some comments. Thank you.
I want to use LSTM as the encoder of sentences and then calculate their similarity based on it. But when I train the model, it seems that the parameters of it dont change at all. I've also tried other models before, and it turns out that they work properly. So I wonder if there's something wrong in my implementation of this model.
My model is as follows,