chaitjo / lstm-context-embeddings

Augmenting word embeddings with their surrounding context using bidirectional RNN
https://chaitjo.github.io/context-embeddings/
MIT License
60 stars 18 forks source link

what is the usage of the cnn filters? #1

Closed liaicheng closed 4 years ago

liaicheng commented 7 years ago

HI, i learn about your model, that's a good model. I get some confuse about the parts "Convolution + maxpool layer for each filter size" ?what's that part used for? what's difference about just concating the fw and bw outputs without filter?

thanks!

chaitjo commented 7 years ago

Hello, I am humbled by your interest! :)

You are correct, the bidirectional LSTM is the one that encodes contextual information into new embeddings for each word in a sequence of words. The CNN part is just a text classification model which is exactly the same as the one published by Yoon Kim. More information can be found in the Implementation section of the README.

You can easily replace the CNN part with any other text classifier that takes as input the word embeddings of the words in a sequence.

The modelling of context into word embeddings is useful for many other NLP tasks. For example, Google's machine translation system uses it. The appropriate pipeline can be constructed on top of the proposed model and the system can simply be trained end-to-end by backpropagating the error. I chose to demonstrate the model on text classification but it is a very general idea!

hadifar commented 7 years ago

Could you please tell me how can I just encodes contextual information into new embeddings for each word in a sequence of word?? I look into your code but it's a bit complicated for me to figure it out :(

chaitjo commented 7 years ago

Hi @AmirHadifar thanks for the interest!

We propose to use a bidirectional LSTM to encode context into independent word embeddings (for example, word2vec embeddings). At each timestep of the LSTM network (here, each timestep is each word in the sentence, left to right), the LSTM modifies the word embedding of the word based on embeddings of words to its left and right in the sequence.

sunsidazzz commented 7 years ago

Hello @chaitjo Thanks for your code and explanation on the cnn filters.

I am doing a project in classfying human written sentences and machine generated sentences(not making sense in logic or in grammar) And I figured out that I should use a rnn model as the text classifier.

I kinds of get the idea when you say: "You can easily replace the CNN part with any other text classifier that takes as input the word embeddings of the words in a sequence."

So with self.lstm_outputs_expanded as the input of the classification model, I can pick a rnn model in tf.nn to replace the cnn, right? But what should I do with the relu and maxpool? What's the functions of them?

I am new to tensorflow and deep learning, so thank you in advance if you can you give me more details and directions on how can I replace cnn model with a rnn model?

Thanks!

chaitjo commented 7 years ago

Hey @sunsidazzz you are correct- at the self.lstm_outputs_expanded point, the embeddings have been modified to incorporate the surrounding context. After this, you can replace it with some other binary text classifier.

Regarding relu and maxpooling, those are simply operations which are part of the CNN for text classification introduced by Kim. For a beginner, I point you to this excellent resource that should clarify all your doubts about the CNN model- part 1 and part 2.

bpoti commented 6 years ago

Hi @chaitjo I am trying to understand the embeddings part. so first we use word2vec (which is also single layer RNN) to get vectors (or embeddings) for words and then use LSTM (RNN) on those vectors (or embeddings) to update them with Document context or conversation context?

chaitjo commented 6 years ago

Hey @bpoti Thanks for the interest and sorry for this late reply. I believe the blog post should explain what you're looking for: https://chaitjo.github.io/context-embeddings/