Closed mongrel73 closed 8 years ago
Hi, You have to add one more parameter to word2vec: 'binary'
var word2vec = Word2VecBuilder.Create()
.WithTrainFile(inputFile)
.WithOutputFile(outputFile)
.WithBinary(1);
.Build();
then use
var distance = new Distance(outputFile);
BestWord[] bestwords = distance.Search("Texas");
'Analogy' needed to search text analogies e.g. 'usa washington russia' -> moscow
P.S. make sure that you have enough data. 2 mln words or more text is preferred P.P.S. i recommend you make convert all text to lowercase, because "Texas" and "texas" will be different tokens
@mongrel73, in main page I've described all of word2vec parameters more explicitly. you can configure word2vec for your own task. E.g. may be size of word vectors(features of word) will be useful for you
Your readme explains how to use input data to create vectors and write them to a txt file.
This works - at least, outputFile.txt is created and it seems to be full of vectors - but I now want to use "outputFile.txt" to find words similar to "Texas". How do I do that?
The full program I'm trying is pasted below. using distance.Search gives me and empty array, and using analogy.Search gives me results, but the "Word" property on each "BestWord" is a number, followed by null-terminating operators:
Is there a simple way to input "Texas", and output ["Arizona", "Oklahoma", "Kansas"] etc.?