SkyworkAI / Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。
Other
1.21k stars 111 forks source link

Why releasing 13B-model instead of smaller ones, say, 7B? #6

Closed yucc-leon closed 10 months ago

yucc-leon commented 10 months ago

In your tech report Chaper 3 you made a comparison between llama-7B and (your) GPT-7B, but you finally released a 13B model. So there are two questions:

  1. will you release a smaller model ?
  2. why do you design your model as the report listed? Does it perform better than the llama architecture?
TianwenWei commented 10 months ago

The llama-7B and GPT-7B mentioned in our technical report are toy models trained only for 200 B tokens. This preliminary experiment is only intended to verify the superiority of llama architecture (rope + rmsnorm + swiglu).

  1. Indeed we are currently developping models of smaller and bigger sizes. Please stay tuned :)
  2. We'd like to develop a model that is different and, hopefully, better. A model that is thinner but taller looks promising. That said, it is likely that the difference between our model and llama is marginal.