SCIR-HI / Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
Apache License 2.0
4.31k stars 422 forks source link

An error occurred while generating the dataset #71

Closed chencoder1 closed 9 months ago

chencoder1 commented 10 months ago

raise DatasetGenerationError("An error occurred while generating the dataset") from e datasets.builder.DatasetGenerationError: An error occurred while generating the dataset 当我的训练数据超过两万条左右时就会出现这个错误,有没有遇到这种情况的?

s65b40 commented 10 months ago

我们也有在更大规模数据上进行训练,没有出现过类似错误,可以考虑一下数据集的格式是否存在问题

zhu-code commented 1 week ago

请问llama_data.json这个文件就用cmedkg构建的数据吗