bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

The difference between zero-3 and megatron with zero-2 #395

Open nicosouth opened 1 year ago

nicosouth commented 1 year ago

hi, I looked up a lot of information.

but I still don't understand the difference between zero-3 and megatron with zero-2.

they all split the model.