allocate embed norm only on pp0

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.32k stars 213 forks source link

Closed stas00 closed 2 years ago

stas00 commented 2 years ago

Don't allocate embed LN on pp rank -1, since it's not being used and it just wastes gpu memory if we allocate it.