eabdullin / Word2Vec.Net

implementation Word2Vec for .Net framework
Other
126 stars 41 forks source link

Output file #11

Closed michaelflam closed 8 years ago

michaelflam commented 8 years ago

I am unsure how exactly the output file is formatted. The first line is the number of words and another int. Is that the max size of each word? Then, on the next line, a word followed by a bunch of small doubles (it seems between -0.005 and 0.005); there are more than the amount of words there are (i.e 8 words, waaaay more than 8 doubles) so I assume this is not the distance. There are lines for each word. When I use the Distance class, it doesn't get all the words, instead it just gets the first and then every entry after is just a bunch of nonsense numbers which don't seem to correspond to anythying.

michaelflam commented 8 years ago

I found the issue- the parsing of the file, reading in the 4 bytes after reading the word, produces very erroneous results. I wrote another method ReadFloat which reads each float in and replaced the other code, and that fixed the problem. I also wrote a methods that advances the stream to the end of the line.