gesistsa / sweater

👚 Speedy Word Embedding Association Test & Extras using R
GNU General Public License v3.0
27 stars 4 forks source link

all zero vectors will generate `NaN` #41

Open chainsawriot opened 1 year ago

chainsawriot commented 1 year ago

In general, cosine is not a good distance measurement for all-zero vectors. But we can't change that.

https://github.com/chainsawriot/sweater/blob/6aebf710d813033c6d07f0268f12bd3e6badaee5/src/weat.cpp#L14

This will generate "divide by zero" problem because deno_* is zero, the sqrt of deno_* is also zero, and the denominator is zero.

A simple solution is to imitate pytorch to use eps (pytorch uses 1e-8). The denominator should always be positive (due to the squaring and then rooting).

chainsawriot commented 1 year ago

A quick demo:

sweater:::raw_cosine(c(1,2,3), c(0,0,0))