bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Distill megatron - WIP draft code #351

Closed younesbelkada closed 1 year ago

younesbelkada commented 1 year ago

An attempt to add distillation in Megatron-DeepSpeed