eabdullin / Word2Vec.Net

implementation Word2Vec for .Net framework
Other
127 stars 41 forks source link

Bin file and the closest words #8

Closed visualizeMath closed 8 years ago

visualizeMath commented 8 years ago

Hi folks. I want to learn how can i detect the most similar words after running the mentioned code on your page.(the one under "//more explicit option" description.) I've added the Word2Vec.Net library into my project and i ran the following code :

       string trainfile = @"C:\Users\Berhum\Desktop\kokler.txt";
        string outputFileName = @"C:\Users\Berhum\Desktop\vektor.bin";
        var word2Vec = Word2VecBuilder.Create()
            .WithTrainFile( trainfile )// Use text data to train the model;
            .WithOutputFile( outputFileName )//Use to save the resulting word vectors / word clusters
            .WithSize( 200 )//Set size of word vectors; default is 100
            //.WithSaveVocubFile()//The vocabulary will be saved to <file>
            .WithDebug( 2 )//Set the debug mode (default = 2 = more info during training)
            .WithBinary( 1 )//Save the resulting vectors in binary moded; default is 0 (off)
            .WithCBow( 1 )//Use the continuous bag of words model; default is 1 (use 0 for skip-gram model)
            //.WithAlpha(0.05)//Set the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW
            .WithWindow( 5 )//Set max skip length between words; default is 5
            .WithSample( ( float )1e-3 )//Set threshold for occurrence of words. Those that appear with higher frequency in the training data twill be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)
            //.WithHs( 0 )//Use Hierarchical Softmax; default is 0 (not used)
            .WithNegative( 5 )//Number of negative examples; default is 5, common values are 3 - 10 (0 = not used)
            .WithThreads( 5 )//Use <int> threads (default 12)
            .WithIter( 5 )//Run more training iterations (default 5)
            .WithMinCount( 5 )//This will discard words that appear less than <int> times; default is 5
            .WithClasses( 0 )//Output word classes rather than word vectors; default number of classes is 0 (vectors are written)
            .Build();

        word2Vec.TrainModel();
        var distance = new Distance(outputFileName);
        BestWord[] bestwords = distance.Search("yedek");

After running the application, i get a bin file on my Desktop (Vector.bin) , how can i use this file to figure out the closest words? Thanks in advance.

eabdullin commented 8 years ago

Hi, @curiousboy23! You just have to use Distance class var distance = new Distance(outputFileName);//outputFileName - your vector.bin file BestWord[] bestwords = distance.Search(inputword);// bestwords - closest words to inputword and you can use it whereever you want