Some confusion about the method of the paper

JorunoJobana commented 1 year ago

大佬您好，传统梯度反向链式传播会用到上一步的梯度计算结果，但文中的方法在更新后不存储梯度，是否意味着后续梯度计算中多了重复的计算，类似时间换空间的做法。这么理解正确吗？

QipengGuo commented 1 year ago

没有重复计算，链式法则只会用到上一步的梯度结果，而不是之前所有步的梯度。上一步的还保留，但历史的梯度被动态清空了。

JorunoJobana commented 1 year ago

非常感谢您百忙中的回复。那对于论文中的这张图，严格来说是不是前一个节点的梯度也是实心圆(In Memory)。比如在计算参数P2的梯度G2时，梯度G3也是在内存中的。

感谢您不吝赐教

------------------ 原始邮件 ------------------ 发件人: "Guo @.>; 发送时间: 2023年6月29日(星期四) 下午2:45 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [OpenLMLab/LOMO] Some confusion about the method of the paper (Issue #27)

没有重复计算，链式法则只会用到上一步的梯度结果，而不是之前所有步的梯度。上一步的还保留，但历史的梯度被动态清空了。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

QipengGuo commented 1 year ago

这里的概念有一些复杂，上一步的梯度还在显存中，但它其实不在G3这里存储。我们画的G3指的是.grad属性。但实际上pytorch除了.grad还在计算图中存储了梯度。这里牵扯到了pytorch的autograd graph 以及梯度的在graph上的传递过程

OpenLMLab / LOMO

Some confusion about the method of the paper #27