Why releasing 13B-model instead of smaller ones, say, 7B?

SkyworkAI / Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。

Other

1.21k stars 111 forks source link

The llama-7B and GPT-7B mentioned in our technical report are toy models trained only for 200 B tokens. This preliminary experiment is only intended to verify the superiority of llama architecture (rope + rmsnorm + swiglu).

Indeed we are currently developping models of smaller and bigger sizes. Please stay tuned :)
We'd like to develop a model that is different and, hopefully, better. A model that is thinner but taller looks promising. That said, it is likely that the difference between our model and llama is marginal.

SkyworkAI / Skywork

Why releasing 13B-model instead of smaller ones, say, 7B? #6