bage79 / word2vec4kor

9 stars 0 forks source link

word2vec4kor

tensorboard_log word2vec visualization data

git clone https://github.com/bage79/word2vec4kor tensorboard --logdir=~/workspace/word2vec4kor/tensorboard_log

![demo](https://github.com/bage79/word2vec4kor/raw/master/img/demo.png)

### `ko.wikipedia.org.sentences` raw corpus 
- from `https://ko.wikipedia.org`
- Total sentences: about 3,115,431
```angular2html
wget https://gitlab.com/bage79/nlp4kor-ko.wikipedia.org/raw/master/data/ko.wikipedia.org.sentences.gz
gzip -d ko.wikipedia.org.sentences.gz

Tips

Download korean Wikipedia dump file

Parse dump file(mediawiki format) to text file

Word2vec open source