Kyubyong / wordvectors

Pre-trained word vectors of 30+ languages
MIT License
2.22k stars 393 forks source link

Divide word vectors into simply Chinese and traditional Chinese #16

Closed pencoa closed 6 years ago

pencoa commented 6 years ago

There are two kinds of Chinese word, simplified Chinese and classical Chinese. i.e 国 and 國 share the same meaning and pronunciation. Usually, an Chinese article is written in just one of them. So, if you can divide pre trained Chinese word vectors into simplified Chinese and classical Chinese like facebook fasttext project do, it could largely increase the performance. default

pencoa commented 6 years ago

In practice, programmer can trim the embeddings files. So it doesn't matter.