Here is a heatmap of the cosine similarities of VOID with uniform (the average word vector), frequency (the weighted sum of all word vectors, weights given by frequency) and frequency^0.75 (the weighted sum of all word vectors, weights given by frequency to the 0.75 power, renormalised to be a probability distribution).
These are derived from a normal training run with 10 epochs (parameters as in the drafts).
and the L2 norms of each
frequency 2.081686
uniform 2.655053
frequency^0.75 1.633744
VOID 2.888624
Conclusions:VOID and the frequency weighted average are almost parallel. The difference in L2 norm is likely accounted for by the small variability of the word vector length on the frequency, in the case of VOID. So it is not inaccurate to say that VOID is essentially the frequency weighted sum of the word vectors.
Here is a heatmap of the cosine similarities of VOID with
uniform
(the average word vector),frequency
(the weighted sum of all word vectors, weights given by frequency) andfrequency^0.75
(the weighted sum of all word vectors, weights given by frequency to the 0.75 power, renormalised to be a probability distribution).These are derived from a normal training run with 10 epochs (parameters as in the drafts).
and the L2 norms of each
Conclusions:
VOID
and the frequency weighted average are almost parallel. The difference in L2 norm is likely accounted for by the small variability of the word vector length on the frequency, in the case ofVOID
. So it is not inaccurate to say thatVOID
is essentially the frequency weighted sum of the word vectors.