bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Megatron-DeepSpeed only applies to specific models? #381

Open Bob-cby opened 1 year ago

Bob-cby commented 1 year ago

Is Megatron-DeepSpeed only targeting specific models such as GPT-2? Can it support parallel partitioning of relatively lightweight models such as CLIP?