baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology
https://huggingface.co/baichuan-inc
Apache License 2.0
4.03k stars 286 forks source link

请问baichuan2的训练数据分类体系是出自哪里? #409

Open feifei2023 opened 1 month ago

feifei2023 commented 1 month ago

您好,感谢公布baichuan。在论文中有个问题想请教一下,figure 1中,对训练数据做了类别的可视化分析,共36个类别,请问这36个类别是怎么得到的呢? image @baichuan-assistant 感谢!