[QUESTION] Does it support Knowledge Distillation?

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

10.58k stars 2.37k forks source link

[QUESTION] Does it support Knowledge Distillation? #989

Closed mushan09 closed 2 months ago

mushan09 commented 3 months ago

Does it support Knowledge Distillation?

dongs0104 commented 3 months ago

check Nemo: https://github.com/NVIDIA/NeMo/pull/9849 they are already developed but not released yet,

Experiments and Results We use the NVIDIA Megatron-LM framework [45] to implement our pruning and distillation algorithms for compression and retraining. https://arxiv.org/pdf/2407.14679