Open djinn-anthrope opened 5 years ago
Another ablation that would be nice to have: word2vec
without temperature tempering. That is, remove the code surrounding reducing alpha
every iteration. Try this with our best model, and our baseline model.
In particular, the code to be killed would be:
alpha = starting_alpha *
(1 - word_count_actual / (real)(iter * train_words + 1));
if (alpha < starting_alpha * 0.001)
alpha = starting_alpha * 0.001;
This would need to re-hyperparameter-tune alpha
, since it was probably chosen by tuning in the first place...
v.w
, notsigm(v.w)
cos(theta)
, notsigm(v.w)