issues
search
bigscience-workshop
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k
stars
213
forks
source link
Distill megatron - WIP draft code
#351
Closed
younesbelkada
closed
1 year ago
younesbelkada
commented
1 year ago
An attempt to add distillation in Megatron-DeepSpeed
An attempt to add distillation in Megatron-DeepSpeed