Closed dongfang91 closed 6 years ago
Hello @dongfang91 thanks for your interest!
The code to get the character states can be found in the CharLMEmbeddings
class, specifically in the _add_embeddings_internal
method. The key lines here are the following.
As you can see in the __init__
method of CharLMEmbeddings
, you first load the language model like this:
from flair.models import LanguageModel
lm = LanguageModel.load_language_model('path/to/language/model/file')
Then, as you can see in _add_embeddings_internal
of CharLMEmbeddings
, you prepare a list of sentences that is padded to the longest sentence in the batch. To start, you can only pass one sentence without padding, but it must still be a list.
So, if your sentence is "the grass is green", you can pass it the following way:
all_hidden_states_in_lm = lm.get_representation(['the grass is green'])
The list brackets around the sentence are important, because otherwise it will get interpreted as a list of characters and produce an incorrect embedding.
This command then gives you a tensor containing the hidden states of each character.
Hope this helps!
Thank you so much for your comments! That helps!
Hello Alan,
Thanks for your good research publication "Contextual String Embeddings for Sequence Labeling", I tried to use your pre-trained model in my research.
Currently, I found your language model could produce forward or backward word embedding, I am wondering if I could get the forward or backward embedding for each character in the sentence. Could you please tell me what code I should modify?
Thanks!