why rank 0 comsumes more gpu memory than other ranks within single machine

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Other

10.19k stars 2.29k forks source link

Closed huangjundashuaige closed 1 month ago

huangjundashuaige commented 3 years ago

Where does the extra memory consumption come from? Or I just simply use it wrong?

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days.