NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.43k stars 1.4k forks source link

Add the warning of distributed_fused_adam low bucket usage #1714

Closed shjwudp closed 1 year ago

shjwudp commented 1 year ago

If not set a fit bucket size in the distributed optimizer, memory waste will result. Memory loss is sometimes high but invisible, for example, a 10GB memory penalty for each GPU on a misconfigured gpt-7b. I think reporting a warning when the bucket utilization is low is a solution, and I submitted my code as reference.

shjwudp commented 1 year ago

Hi Tim, It's great to see your reply! I appreciate your help in correcting it; thanks!