About the embedding model of supervised

Hi @1049451037,

The fastText supervised model works as follow: each word (and word ngram) is associated to a vector representation (a.k.a. embedding, in dimension 100 by default). A representation for the input text is obtained by averaging the embeddings corresponding to the words and ngrams that appear in the input. Then, a linear classifier is used on this representation to obtain a score corresponding to each label. When training the model, both the word/ngram embeddings and the linear classifiers are learned, in one step. Said differently, there are two matrices in fastText supervised models: one corresponding to the word embeddings, and one corresponding to the classifiers. These two matrices are learned jointly using the labeled data. Note that it is possible to initialize the word embeddings with pre-trained models (for example, learned on unsupervised data with cbow or skipgram).

I hope this answer your question!

Best, Edouard

facebookresearch / fastText

About the embedding model of supervised #686