Open creat89 opened 6 years ago
I have found the reason. There is one word that has a vector full of 0. Once I delete that word, the sentence has a numeric vector instead of -nand(ind). Why is this 0 word vector affects the calculation of the sentence one? How can I change this behavior?
Hello @creat89,
Thank you for your post. In order to reproduce the issue on my end, could you please post the full set of commands used to train the model and trigger the error you describe?
Thank you, Christian
Hello @cpuhrsch ,
These are one set of hyper parameters that causes the error:
-lr 0.062098028681721665 -dim 200 -wordNgrams 1 -minCount 3 -epoch 10 -minn 6 -maxn 6
The error happens using either cbow or skipgrams.
(The parameters may look not the best, but I'm using a Bayesian optimization to find the best combination)
Hello @creat89,
I've not been able to reproduce your issue, but I think having a word with an associated zero vector might be the problem here. If I recall correctly there was an issue around that a long time ago. Could you try this again with a recent version of fastText and let me know if this resolves your issue?
Thanks, Christian
Hello,
I'm using fastText and I'm getting, for a document, a vector full of -nan(ind) when I use the option print-sentence-vector. However, if I ask for the vector of each word, with print-sentence-vector, all the words have a numerical vector. Which could be the problem? Any idea, where to look for, in order to give you a better description of the problem?
The fastText model was trained by me (using the unsupervised method) with 300 dimensions. The document has 3478 words.