Embedding / Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量
Apache License 2.0
11.82k stars 2.32k forks source link

How the training sentence was segmented? #112

Closed a7744hsc closed 4 years ago

a7744hsc commented 4 years ago

To use the embedding, I prefer to use the same segment method with training since this could help reduce OOV words.

By the way, it would be great if you can provide a small simple for some large file. This gives me a chance to check the format and tokens before a long downloading.

shenshen-hungry commented 4 years ago

Please refer to https://github.com/Embedding/Chinese-Word-Vectors#corpus