flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

Get character embedding #103

Closed dongfang91 closed 6 years ago

dongfang91 commented 6 years ago

Hello Alan,

Thanks for your good research publication "Contextual String Embeddings for Sequence Labeling", I tried to use your pre-trained model in my research.

Currently, I found your language model could produce forward or backward word embedding, I am wondering if I could get the forward or backward embedding for each character in the sentence. Could you please tell me what code I should modify?

Thanks!

alanakbik commented 6 years ago

Hello @dongfang91 thanks for your interest!

The code to get the character states can be found in the CharLMEmbeddings class, specifically in the _add_embeddings_internal method. The key lines here are the following.

As you can see in the __init__ method of CharLMEmbeddings, you first load the language model like this:

from flair.models import LanguageModel
lm = LanguageModel.load_language_model('path/to/language/model/file')

Then, as you can see in _add_embeddings_internal of CharLMEmbeddings, you prepare a list of sentences that is padded to the longest sentence in the batch. To start, you can only pass one sentence without padding, but it must still be a list.

So, if your sentence is "the grass is green", you can pass it the following way:

all_hidden_states_in_lm = lm.get_representation(['the grass is green'])

The list brackets around the sentence are important, because otherwise it will get interpreted as a list of characters and produce an incorrect embedding.

This command then gives you a tensor containing the hidden states of each character.

Hope this helps!

dongfang91 commented 6 years ago

Thank you so much for your comments! That helps!