SkyworkAI / Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
Other
1.21k stars 111 forks source link

huggingface下载的数据有点大 #82

Open qilong-zhang opened 3 months ago

qilong-zhang commented 3 months ago

我看介绍应该硬盘空间是600G,但我用huggingface加载占用了1个多T还没弄完

Generating train split: 208463017 examples [1:07:59, 51102.37 examples/s]