InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
https://internevo.readthedocs.io/zh-cn/latest/?badge=latest
Apache License 2.0
311 stars 52 forks source link

fix(isp.py): fix isp overlap backward allgather twice when activation ckpt 0.x #366

Open huangting4201 opened 3 weeks ago

huangting4201 commented 3 weeks ago

bug修复1 由于handle与weight删除不同步,导致在backward时,且activation ckpt值为小数时,会出现ckpt layer的backward阶段进行两次all gather操作的情况,即为重计算all gather一次,backward all gather一次。这里改成handle与weight同步删除,修复此bug。

bug修复2 修复mlp初始化时 activation_type(swiglu/gelu)传参错误