flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

Add support for Sentence Transformers #1660

Closed alanakbik closed 4 years ago

alanakbik commented 4 years ago

The sentence transformers library has great pre-trained models to produce embeddings for entire sentences.

We should add a new DocumentEmbeddings class to support these embeddings in Flair.

shameelct commented 4 years ago

This commit does not cover contextual word embedding as in other transformer models support in Flair right? Any plans to add that?

alanakbik commented 4 years ago

Not sure if I understand - can you elaborate?

shameelct commented 4 years ago

The following is how a BERT embedding can be used in Flair to obtain contextual word embedding as far as I understand:

from flair.data import Sentence
bert_embedding = TransformerWordEmbeddings('bert-base-multilingual-cased')
sentence = Sentence('The grass is green .')
bert_embedding.embed(sentence)
for token in sentence:
print(token)
print(token.embedding)

Is there a way to obtain contextual word embedding from Sentence Transformer using Flair, like we did for BERT like above code snippet.

alanakbik commented 4 years ago

Sentence transformers are trained to give an embedding for the whole sentence, so getting single word embeddings is currently not possble through Flair. For words, regular transformer embeddings will probably be better.

shameelct commented 4 years ago

Cool thanks