Open thomasahle opened 5 years ago
Hi @thomasahle , Your understanding of the model is correct.
Are you suggesting to add a feature that checks that the dim
parameter is not much greater than the number of labels?
Exactly. It never makes sense to set dim
larger than min(output-dimension, input-dimension)
.
And if it equals that value we might as well have A or B equal to the identity matrix.
This should save a fair bit of calculation in those cases.
FastText supervised creates two matrices A and B, where A has
dim
rows and B hasdim
columns. Given a binary vector x, the output is then softmax(BAx). I hope this is a correct understanding.It seems that when
dim
is greater or equal to the number of distinct labels, we might as well have B equal the identity matrix. In my own experiments what happens now is that B becomes highly numerically unstable (many 1e40, 1e-40, -nan etc.), while A stays nice and bounded.I wonder if it would be simple to check for this case? It should also be faster to train the models since fewer parameters would have to be updated.