Open Xreki opened 3 months ago
Thanks for your contribution!
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 55.81%. Comparing base (
5619cc3
) to head (548db29
). Report is 132 commits behind head on develop.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。
PR types
Others
PR changes
Others
Description
Llama-2 70B模型,训练策略tp4pp8-vpp5-mbs1-acc32(开启sp),不开启
release_grads
选项时能稳定训练50个step:开启
release_grads
后,容易在训练若干个step后OOM,原因是release_grads
功能会在每个step后释放梯度所占用的空间、在下一个step重新分配,增加了显存操作的次数,从而容易引起显存碎片。通过添加显存预分配功能(pre_alloc_memory
),即预先为训练分配好一块大的显存空间,可以避免该问题。