baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] 你好,训练分词模型的代码可以分享吗?或者有什么参考吗? #117

Open StarrySeas1 opened 1 year ago

StarrySeas1 commented 1 year ago

Required prerequisites

Questions

2000万中英文数据用于训练分词模型,数据分布是多少?有一些更多的参考信息吗?

Checklist