Check the relation of VOID to the (weighted) mean of all word vectors

Here is a heatmap of the cosine similarities of VOID with uniform (the average word vector), frequency (the weighted sum of all word vectors, weights given by frequency) and frequency^0.75 (the weighted sum of all word vectors, weights given by frequency to the 0.75 power, renormalised to be a probability distribution).

These are derived from a normal training run with 10 epochs (parameters as in the drafts).

and the L2 norms of each

frequency         2.081686
uniform           2.655053
frequency^0.75    1.633744
VOID              2.888624

Conclusions: VOID and the frequency weighted average are almost parallel. The difference in L2 norm is likely accounted for by the small variability of the word vector length on the frequency, in the case of VOID. So it is not inaccurate to say that VOID is essentially the frequency weighted sum of the word vectors.

benjaminwilson / word2vec-norm-experiments

Check the relation of VOID to the (weighted) mean of all word vectors #7