QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
9.17k stars 571 forks source link

Qwen1.5-MoE-A2.7B模型训练时如何初始化的 #243

Closed pkumc closed 4 months ago

pkumc commented 7 months ago

@JustinLin610 博客里面提到“我们首先利用已有的Qwen-1.8B,将其改造为Qwen1.5-MoE-A2.7B。此外,在初始化阶段引入随机性可以显著加快收敛速度,并在整个预训练过程中带来更好的整体性能表现”。有两个问题想请教下:

  1. 是不是先按照Qwen1.5-1.8B的intermediate_size 5504进行分割,分割成4个小的expert,每个expert是1376维。然后再加入随机性,加上随机初始化的32维,变成1408维?其余非moe的参数,就直接继承Qwen1.5-1.8B?
  2. 初始化后,博客又提到“由于我们的初始化方法,我们不需要训练同样数量的token即可达到很好的模型效果,这也显著了降低了训练成本。”这块大概是用了多少token进行继续训练的呢?
JustinLin610 commented 6 months ago

Stay tuned for our coming tech report. Temporarily we do not release details about this

github-actions[bot] commented 4 months ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.