Closed pkumc closed 4 months ago
Stay tuned for our coming tech report. Temporarily we do not release details about this
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
@JustinLin610 博客里面提到“我们首先利用已有的Qwen-1.8B,将其改造为Qwen1.5-MoE-A2.7B。此外,在初始化阶段引入随机性可以显著加快收敛速度,并在整个预训练过程中带来更好的整体性能表现”。有两个问题想请教下: