Open omerarshad opened 3 years ago
Hi @omerarshad It is just an example. With a transformer model as word_embedding_model, there is no need for a CNN layer.
However, if you use GloVe embeddings, then adding a CNN layer on top makes sense.
Exactly my point, is there a way to use word embeddings from BERT, such as no cross encoding is applied, I only get individual word embedding and then applying CNN on it? Actually i want to train a CNN which can take input of 1000 words, and i just want to encode each word using bert and pass to CNN
is there a way to use word embeddings from BERT, such as no cross encoding is applied, I only get individual word embedding and then applying CNN on it?
This does not make sense. Apply BERT only make sense to get contextualized word embeddings, not for getting embeddings for individual words.
Use GloVe (or word2vec) etc. would be the much better solution.
You get this by replacing the line like this:
word_embedding_model = models.WordEmbeddings.from_text_file('glove.6B.300d.txt.gz')
Yes but this type of solutions have several issues, OOV , large word vector file.
FastText can handle OOV better then w2v/glove I think, due to their tokenization. But as Nils said, it makes no sense to use a contextualized language model for generating single word embeddings. If you have a domain that does not work well with a pretrained word embedding model, you could train your own.
so in example :
word_embedding_model = models.Transformer('bert-base-uncased')
cnn = models.CNN(in_word_embedding_dimension=word_embedding_model.get_word_embedding_dimension(), out_channels=256, kernel_sizes=[1,3,5])
Apply mean pooling to get one fixed sized sentence vector
pooling_model = models.Pooling(cnn.get_word_embedding_dimension(), pooling_mode_mean_tokens=True, pooling_mode_cls_token=False, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, cnn, pooling_model])
What is the purpose of applying CNN here? Isn't it better just to take mean of output of "word_embedding_model" ? It looks as applying CNN is an overhead. Anyone can explain?