Firstly I downloads some readme.md files in folder Readme.
Secondly I do some language parsing to clean data.
Thirdly I use word2vec to train a model.
Since our object is to use word2vec model to make readme.file searchable, it seems that word2vec doesn't provide a similarity calculation on documents. So, I decide to use the second approach proposed by Dr. Mockus.
Firstly I downloads some readme.md files in folder Readme. Secondly I do some language parsing to clean data. Thirdly I use word2vec to train a model. Since our object is to use word2vec model to make readme.file searchable, it seems that word2vec doesn't provide a similarity calculation on documents. So, I decide to use the second approach proposed by Dr. Mockus.