dbiir / UER-py

Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
https://github.com/dbiir/UER-py/wiki
Apache License 2.0
2.97k stars 528 forks source link

是不是还没添加中文分词 #6

Open Inspiring26 opened 5 years ago

iamzww commented 5 years ago

我感觉是没有的 预训练模型都是char-level的 vocab依赖的都是google_vocab.txt

Embedding commented 5 years ago

By now all models are based on characters. We will add word-based BERT and BERT with whole-word-mask in the near future.

Embedding commented 5 years ago

Word-based BERT model is now available . Please see section Chinese_model_zoo. Word-based BERT model is useful in finding words' nearest neighbors. Examples can be found in Qualitative evaluation.