eabdullin / Word2Vec.Net

implementation Word2Vec for .Net framework
Other
126 stars 41 forks source link

Using word2vec format models? #23

Open demongolem opened 5 years ago

demongolem commented 5 years ago

In relation to what I saw in #20, I notice that if I get my own large text file and go through the entire training process creating a bin model, the distance.Vocab array appears to correctly contain the vocab (appearing 5 times because that is what I set)

However, if I try to directly use a word2vec model by doing `var distance = new Distance(modelName);' the data structure distance.Vocab gets the first word correct but then is followed by a bunch of floating point numbers and I get words like 0.14687.

So can this code handle word2vec format out of the box like that avoiding training? Is there a problem in that what training creates is binary format whereas the word2vec appears to be a text readable format?