UKPLab / germeval2017-sentiment-detection

Sentence Embeddings used in the GermEval-2017 Submission
Other
13 stars 1 forks source link

How to use the pre-trained embedding, e.g. twitter_wiki_germeval_1000.bin? And the window size of the CNN in paper? #1

Closed jingliao132 closed 4 years ago

jingliao132 commented 5 years ago

Hello, UKPLab. I have download the pre-trained embeddings in binary format. How could it be made into an embedding vector based on certain vocabulary? Do you provide any interface like gensim?

Besides, I failed to find setup of CNN window size but the CNN-non-static in (Kim, 2014) work that you refer to suggest the window size should be defined. Is the tweet length fixed in input? If so, what is the window size value?

In general, I am not quite clear about how you preprocessing original tweets, could you please describe or recommend any resources about initializing embeddings for tweets?

Thanks a lot.

Wuhn commented 4 years ago

Hi @jingliao132 ,

sent2vec already provides an interface to use the embeddings:

import sent2vec
model = sent2vec.Sent2vecModel()
model.load_model('model.bin')
emb = model.embed_sentence("once upon a time .") 
embs = model.embed_sentences(["first sentence .", "another sentence"])

You can then use the resulting embedding vectors in any way (e.g., as input for a sentence classifier).

Regarding the window size: Sent2vec uses dynamic context windows as described in the paper. If the question is wrt. the n-gram size, we use the default value (2).

Sorry for the late reply, we hope you were still able to resolve the issues.

Best, Ji-Ung

jingliao132 commented 4 years ago

Thanks for your help. My issues are well-resolved.

Best regards, Jing

在 2020年9月8日,17:15,Ji-Ung Lee notifications@github.com 写道:

 Hi @jingliao132 ,

sent2vec already provides an interface to use the embeddings:

import sent2vec model = sent2vec.Sent2vecModel() model.load_model('model.bin') emb = model.embed_sentence("once upon a time .") embs = model.embed_sentences(["first sentence .", "another sentence"]) You can then use the resulting embedding vectors in any way (e.g., as input for a sentence classifier).

Regarding the window size: Sent2vec uses dynamic context windows as described in the paper. If the question is wrt. the n-gram size, we use the default value (2).

Sorry for the late reply, we hope you were still able to resolve the issues.

Best, Ji-Ung

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.