NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
10.19k stars 2.29k forks source link

why rank 0 comsumes more gpu memory than other ranks within single machine #78

Closed huangjundashuaige closed 1 month ago

huangjundashuaige commented 3 years ago

Where does the extra memory consumption come from? Or I just simply use it wrong?

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

Marking as stale. No activity in 60 days.