Closed n-ibrahimov01 closed 5 years ago
Hello @n-ibrahimov01 yes the reason for this is that the CharacterEmbeddings
are randomly initialized and only make sense if you train them on a downstream task first. So you can use them when training your own model. During model training, these embeddings will then get trained to make sense for that task.
If you're just interested in embedding text and not in training a downstream task model, you should use any of the pre-trained embeddings, such as WordEmbeddings
, FlairEmbeddings
and BertEmbeddings
.
Why don't large pretrained character embeddings models exist yet?
@Hellisotherpeople the FlairEmbeddings
are large pre-trained character embeddings.
They're different in that FlairEmbeddings
are contextualized and pre-trained, whereas CharacterEmbeddings
are uncontextualized and require to be trained on a task. We did some comparisons of the two in our COLING 2018 paper; At least on the tasks we looked at, flair embeddings were much better and when we used them, task-trained character features were no longer necessary.
I should have read your paper more closely 😁.
:D no worries - will close this issue, but feel free to reopen if you have more questions!
This is a code that I am using:
emb_1 and emb_2 return different values.