aitsc / GLMKD

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
MIT License
30 stars 1 forks source link

关于您最新的论文中2b模型的效果 #1

Closed Kausal-Lei closed 12 months ago

Kausal-Lei commented 12 months ago

您好,我看到您做了直接Finetune T1(110M),T2(340M),T3(10B)模型的效果,也看到了通过您方法蒸馏的S2模型能达到很好的效果,我想知道您是否做过直接Finetune S2(2B)模型的效果?通过您方法蒸馏的S2模型 比直接SFT效果能好多少呢?

aitsc commented 12 months ago

See section 6.3.

Kausal-Lei commented 12 months ago

谢谢~