facebookresearch / fastText

Library for fast text representation and classification.
https://fasttext.cc/
MIT License
25.85k stars 4.71k forks source link

question for pre_trained vector #187

Closed zhongboyin closed 7 years ago

zhongboyin commented 7 years ago

What is the each dimension of the 300 dimensions represent? Is it have a specific meaning for each dimension ?

lgalke commented 7 years ago

I suggest this introductory read. It is the result of the algorithm learning a good representation (to predict the context).

zhongboyin commented 7 years ago

Thank for your answer but I still want to konw what's the each dimension of the 300D represent?

bkj commented 7 years ago

Values in a single dimension are not intrinsically interpretable.

ghost commented 7 years ago

@zhongboyin This is an interesting question, not an easy one to answer though. A very simplistic answer is, that you have to treat word vectors as random vectors in practice. There is no single vector dimension that represents e.g. “gender”, etc..

More precisely, word vectors are the result of a really simple neural network, that ultimately tries to minimize the inner product of the word vectors that are not associated with similar contexts (see Chris McCormick’s blog for a beginner’s introduction). The learning algorithm does not impose a lot of structure on the word vector coordinates. In general, you can assume that word vectors are isotropic vectors, meaning that these vectors behave much like random vectors, with independent coordinates and no preferred direction (see also Sanjeev Arora’s blog post) on the issue. These vectors are just tuned to give you a cosine distance (which is the inner product of the normalized vectors) that approaches one for similar words. Facebook’s fastText takes this a step further by taking into account sub word (N-gram) features, but the underlying ideas are the same.

Does that mean that there is no structure in word vectors whatsoever? No, not quite, and this is where things become interesting… and more complicated. To give you a very rough idea of how you can model this: Picture word vectors as the result of a linear combination of a handful of "context vectors” (isotropic unit vectors a.k.a. “discourse vectors”). Think of them as building blocks of word vectors each of them representing a narrow contextual topic. You can approximate these via a technique called “sparse cording”, but that probably goes far beyond your original question. If you want to learn more, you'll find more details on the topic in this video and on Arora’s blog and the paper that goes along with it.