请问如何增加/修改词表vocab.txt

ZhuiyiTechnology / WoBERT

以词为基本单位的中文BERT

Apache License 2.0

458 stars 70 forks source link

Open Crescentz opened 4 years ago

Crescentz commented 4 years ago

bert的中文vocab.txt的汉字太少了，请问垂直领域遇到这样情况是怎么增加自己的token呢，【unused】不够用

ZhuiyiTechnology commented 4 years ago

1、把它们加入到vocab.txt里边 2、通过compound_tokens参数追加。

以上只适合bert4keras，可以仔细琢磨一下训练脚本中追加词的方法：https://github.com/ZhuiyiTechnology/WoBERT/blob/master/train.py

yuhaiyan-77 commented 1 month ago

您好，我无法下载文件请问还有没有别的办法下载模型

alanbreeze commented 1 month ago

已恢复下载