Simple model for few labels

thomasahle commented 5 years ago

FastText supervised creates two matrices A and B, where A has dim rows and B has dim columns. Given a binary vector x, the output is then softmax(BAx). I hope this is a correct understanding.

It seems that when dim is greater or equal to the number of distinct labels, we might as well have B equal the identity matrix. In my own experiments what happens now is that B becomes highly numerically unstable (many 1e40, 1e-40, -nan etc.), while A stays nice and bounded.

I wonder if it would be simple to check for this case? It should also be faster to train the models since fewer parameters would have to be updated.

Celebio commented 5 years ago

Hi @thomasahle , Your understanding of the model is correct.

Are you suggesting to add a feature that checks that the dim parameter is not much greater than the number of labels?

thomasahle commented 5 years ago

Exactly. It never makes sense to set dim larger than min(output-dimension, input-dimension). And if it equals that value we might as well have A or B equal to the identity matrix. This should save a fair bit of calculation in those cases.

facebookresearch / fastText

Simple model for few labels #879