SupritYoung / Zhongjing

A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.
Apache License 2.0
303 stars 29 forks source link

预训练使用了多少数据呢? #11

Closed murray-z closed 10 months ago

murray-z commented 1 year ago

预训练使用了多少数据呢?全部都是医学相关数据吗,还是有部分通用数据?

SupritYoung commented 11 months ago

你好,几乎全部都是医学相关数据

SupritYoung commented 11 months ago

具体的数量可以参加我们的论文:https://arxiv.org/abs/2308.03549