Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Character-level BiLSTM take the first and last hidden state #426

Open allanj opened 6 years ago

allanj commented 6 years ago

I want to implement character embedding with BiLSTM as in this paper(Neural Architectures for Named Entity Recognition Guillaume) . Specifically, I give the characters of a word as input to a BiLSTM. Then I concatenate the last hidden state of the forward LSTM and the first hidden state of the backward LSTM, that's the result I want.

However, I found it would be hard if I have the variable length of words.

Let's say the word contains 3 characters (1 2 and 3), the maximal length is 5. So, the input to the BiLSTM will be the embeddings of the following tokens:

1, 2, 3, 0, 0

But, if I want to take the last hidden state, it would become 0 since the last hidden state is padded by 0. I couldn't let the model know to get the third position as a different sentence has a different position.

tastyminerals commented 6 years ago

This should not be the input to your BiLSTM, you first use a LookupTable to encode character indeces into character vectors. Read 4.1 Character-based models of words paragraph from your paper again.

allanj commented 6 years ago

Yes. Sorry I didn't put an embedding layer before that. The problem is still the same, so the network now becomes the (embedding layer + BiLSTM). Still the same input.

But the problem exists.