anvaka / word2vec-graph

Exploring word2vec embeddings as a graph of nearest neighbors
https://anvaka.github.io/pm/#/galaxy/word2vec-wiki?cx=-4651&cy=4492&cz=-1988&lx=-0.0915&ly=-0.9746&lz=-0.2030&lw=0.0237&ml=300&s=1.75&l=1&v=d50_clean_small
707 stars 93 forks source link

clarification of distance metric #4

Open matanox opened 6 years ago

matanox commented 6 years ago

Hi,

This is a coolest project, really awesome :-) Would you care to kindly comment on the distance metric/s implemented and the rationale thereof?

In many machine learning scenarios we pick e.g. cosine similarity on normalized vectors (so that we're working in a multi-dimensional sphere). Is this here very different in that you look at the plain vector distance?

Thanks in advance for your commenting!!

anvaka commented 6 years ago

I'm using spotify/annoy library for the index. By default its distance metric is set to 'angular':

Annoy uses Euclidean distance of normalized vectors for its angular distance, which for two vectors u,v is equal to sqrt(2(1-cos(u,v)))

And that is exactly what I'm using here. So, this is regular euclidean distance, implemented by law of cosines on normalized vectors

matanox commented 6 years ago

hey thanks. great work.