akutuzov / webvectors

Web-ify your word2vec: framework to serve distributional semantic models online
http://vectors.nlpl.eu/explore/embeddings/
GNU General Public License v3.0
196 stars 49 forks source link

Not able to calculate the opposite #10

Closed jkleiser closed 7 years ago

jkleiser commented 7 years ago

In the section to the right at http://vectors.nlpl.eu/explore/embeddings/en/calculator/ where you can do algebraic operations on vectors, it is not possible to calculate the "opposite" of a word by just entering that word in the negative field. I was curious to see e.g. what the opposite of dark_ADJ could be.

One way to get around this was adding a dummy word, e.g. boy_NOUN, in both the positive and negative field. The opposite of dark_ADJ, however, was not quite what I had expected. ;-)

Thanks a lot for this WebVectors site! I will take a closer look at the data and the coding when I find some time.

akutuzov commented 7 years ago

Hi @jkleiser, Thanks for the interest!

I don't think that in this way you can get the opposite (the antonym) of the query word. The whole point of the Calculator is to perform algebraic operations on sets of vectors; providing only one word as an input makes no sense - at best, the model can just output the nearest associates to this word. By providing a 'dummy' word you essentially subtract the 'dark' vector from the origin, coming at some vector which is kind of orthogonal to the 'dark' vector. However, this word will almost certainly not be an antonym of 'dark'. This is because antonyms are often distributionally similar: 'dark' and 'bright' occur in almost identical contexts. Thus, this orthogonal vector in almost all cases will have nothing to do with the original query.

Detecting antonyms via embedding models is a difficult research problem, precisely because antonyms are distributionally indistinct from synonyms.