bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

How can I set recomputation-granularity,like selective or full? #403

Open LordEdison opened 4 months ago