We train syn0 by picking its vectors directly from its parameterization as a sphere. Hence, syn1neg now simply stores the angles, and we perform gradient descent on this angle based representation now.
[x] Verify that the core functions angle2der (which computes derivaties wrt angles), angle2vec (which converts the angle representation to the vector representation), and angleprecompute (which memoises results for fast computation) all work. This was done by writing a reference codegen.py which uses slow but correct (automatically computed) derivatives against our hand-written derivatives. We use sympy for this purpose. We add a new mode called -stress-test that allows us to feed inputs to word2vec to check against our reference implementation.
[ ] Start real world training jobs and see what the results look like
[ ] make syn1neg also sampled from the sphere, not zero initialized
[ ] rework word-analogy to use solid angle of sphere to compute a:b :: x : ?. This will take the solid angle between a and b, and move along the same solid angle from x to create y
We train
syn0
by picking its vectors directly from its parameterization as a sphere. Hence,syn1neg
now simply stores the angles, and we perform gradient descent on this angle based representation now.angle2der
(which computes derivaties wrt angles),angle2vec
(which converts the angle representation to the vector representation), andangleprecompute
(which memoises results for fast computation) all work. This was done by writing a referencecodegen.py
which uses slow but correct (automatically computed) derivatives against our hand-written derivatives. We usesympy
for this purpose. We add a new mode called-stress-test
that allows us to feed inputs toword2vec
to check against our reference implementation.syn1neg
also sampled from the sphere, not zero initializedword-analogy
to use solid angle of sphere to computea:b :: x : ?
. This will take the solid angle betweena
andb
, and move along the same solid angle fromx
to createy