Closed jaidhyani closed 6 months ago
thinking about a little bit more, I'm 85% confident it doesn't work
you're re-assigning a new expression to the loss
variable in the for loop, so the previous data including gradient is lost
But the loss variable never stored the gradient, right? The gradient is
stored on the model tensors as a side effect of loss.backward
.
We're definitely going to add tests to validate, but I expect the current implementation to be correct. See https://stackoverflow.com/posts/62076913/revisions
On Tue, Mar 26, 2024, 10:34 AM Jett @.***> wrote:
thinking about a little bit more, I'm 85% confident it doesn't work you're re-assigning a new expression to the loss variable, so the previous data including gradient is lost
— Reply to this email directly, view it on GitHub https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021068909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5BY5YM2OKWJOKQHHSIV3Y2GWTTAVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGA3DQOJQHE . You are receiving this because you authored the thread.Message ID: @.***>
ha, good point!
On Tue, 26 Mar 2024 at 12:58, Jai Dhyani @.***> wrote:
But the loss variable never stored the gradient, right? The gradient is stored on the model tensors as a side effect of
loss.backward
.We're definitely going to add tests to validate, but I expect the current implementation to be correct. See https://stackoverflow.com/posts/62076913/revisions
On Tue, Mar 26, 2024, 10:34 AM Jett @.***> wrote:
thinking about a little bit more, I'm 85% confident it doesn't work you're re-assigning a new expression to the loss variable, so the previous data including gradient is lost
— Reply to this email directly, view it on GitHub < https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021068909>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAC5BY5YM2OKWJOKQHHSIV3Y2GWTTAVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGA3DQOJQHE>
. You are receiving this because you authored the thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021351106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECKP3YDDZU6DOXRMUIOLJTY2HHN7AVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGM2TCMJQGY . You are receiving this because you commented.Message ID: @.***>
Factored out the gradient accumulation logic added tests for it in #89; confirmed that gradients do be accumulating.
Check that gradient accumulation logic works as expected. The code as is looks kind of suspect at the moment.