epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

Assertion `counts.size() == osz_` failed error when generate & use model other than 600 dimensions #12

Closed kyoungrok0517 closed 4 years ago

kyoungrok0517 commented 6 years ago

Hi, I see the following error when I train & use model in 700 dimensions. I followed the example code for training. The same happens when I use the pre-trained Wikipedia model of 700 dimension.

image

kyoungrok0517 commented 6 years ago

Happens if I use nnSent subcommand.

kyoungrok0517 commented 6 years ago

Seems like this happens if I use the models trained by myself regardless of resulting dimension size.

mpagli commented 6 years ago

Hi @kyoungrok0517, I tried to reproduce your issue without success. I was able to train a new model using an arbitrary amount of dimensions and use the nnSent functionality. Here are the commands I used:

./fasttext sent2vec -lr 0.2 -dim 100 -thread 25 -input 1M_wiki_sentences.txt -output model
./fasttext nnSent model.bin 1M_wiki_sentences.txt 10

Are you using the latest code? What is the command you used to train your sent2vec model ?

kyoungrok0517 commented 6 years ago

Hmm... I've tested the code on Bash in Windows. This is quite peculiar so may be the cause of the problem.

btw, could you please tell me the specs of the system you're using? I encounter segmentation fault when compiling on my Macbook Pro. I'm not sure your code is based on the latest version of fasttext, but seems like it's old bug in fastText.

mpagli commented 6 years ago

I'm using Linux Mint 18.1 Serena (4.4.0-53-generic x86_64), and compiling using g++ 4.8.5.

If running the two commands above (replacing the text file with some random data) gives you the same error then it might indeed be a platform problem. I don't have a windows or mac environment to debug it. You're saying this might be an old bug in fasttext, do you know if there is any github issue associated to it ?

kyoungrok0517 commented 6 years ago

https://github.com/facebookresearch/fastText/issues/239

This case might be related to my segmentation fault issue in mac. Regarding count.size fault... I think I'll need to test with linux machine which I don't have right now.

hanabi1224 commented 6 years ago

I met same error when I use mingw64-c++ compiler to produce win dll on linux, I fixed my case in matrix.cc with:

if(sizeof(long) == 8){ in.read((char*) data_, m_ * n_ * sizeof(real)); } else { for(auto j=0;j<n_;j++) { in.read((char*)(data_ + m_ * j), m_ * sizeof(real)); } }

martinjaggi commented 6 years ago

apparently this has been resolved in fasttext in facebookresearch/fastText#239