epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

Training fails without any error message for most hyperparameter combinations on Windows #32

Open mayankshrivastava opened 6 years ago

mayankshrivastava commented 6 years ago

I've built the package on Windows using GnuWin.

My first issue was with the pre-trained models that were on the GitHub page, when I tried to use them to encode sentences, I got an assertion failed error : Assertion failed: (counts.size() == osz_). I never got around this.

Later, I was trying to train a model on a subset of Wikipedia restricted to a certain domain, so that the embedding is a domain specific one. The training fails without any error shown with most of the hyperparameter combinations, and no bin file is outputted. Only with very select hyperparameter combinations does the training complete - and then also sometimes it gets stuck after getting to 100% and nothing is outputted. The attached image below shows a training cycle which reached a 100% and then hasnt outputted anything for 4 days.

image

My input dataset is about 221 MB in size, with about 120k words having a word count > 5.

mayankshrivastava commented 6 years ago

For more context, the assertion fails in the following function:

void Model::setTargetCounts(const std::vector& counts) {
assert(counts.size() == osz); if (args->loss == lossname::ns) { initTableNegatives(counts); } if (args->loss == loss_name::hs) { buildTree(counts); } }

osz_ has a count of 0, whereas counts.size() is 54455 as per the screenshot above.

mayankshrivastava commented 6 years ago

Also, I built latest version of fastText separately, and trained a skip-gram model using it, and that completed fine, with no such assertion errors.

tqx94 commented 4 years ago

Hi, I encountered the same issue as shown in your screenshot as well. May I understand how did you resolve?

mayankshrivastava commented 4 years ago

Hi, I unfortunately was not able to find any solution to this.