关于您最新的论文中2b模型的效果

aitsc / GLMKD

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

MIT License

30 stars 1 forks source link

关于您最新的论文中2b模型的效果 #1

Closed Kausal-Lei closed 12 months ago

Kausal-Lei commented 12 months ago

您好，我看到您做了直接Finetune T1(110M),T2(340M),T3(10B)模型的效果，也看到了通过您方法蒸馏的S2模型能达到很好的效果，我想知道您是否做过直接Finetune S2(2B)模型的效果？通过您方法蒸馏的S2模型比直接SFT效果能好多少呢？

aitsc commented 12 months ago

See section 6.3.

Kausal-Lei commented 12 months ago

谢谢~