bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Distill BLOOM - tentative 2 #354

Open younesbelkada opened 1 year ago

younesbelkada commented 1 year ago

Tentative of applying teacher student using Megatron-DeepSpeed

WIP draft PR - not supposed to merge

cc @thomasw21