Injecting background knowledge into the word vectors
corpora
folder. Script assumes one text per line, splits sentences on dot.model.w2v
filejava -cp thewikimachine.jar org.fbk.cit.hlt.thewikimachine.xmldump.WikipediaTextExtractor
-d <path-to-dump.xml>
-o <path-to-output-directory>
-t <amount of threads>