delphi-suite / delphi

small language models training made easy
Apache License 2.0
9 stars 1 forks source link

test and validate gradient accumulation logic (+ fix if necessary) #79

Closed jaidhyani closed 6 months ago

jaidhyani commented 6 months ago

Check that gradient accumulation logic works as expected. The code as is looks kind of suspect at the moment.

jettjaniak commented 6 months ago

thinking about a little bit more, I'm 85% confident it doesn't work you're re-assigning a new expression to the loss variable in the for loop, so the previous data including gradient is lost

jaidhyani commented 6 months ago

But the loss variable never stored the gradient, right? The gradient is stored on the model tensors as a side effect of loss.backward.

We're definitely going to add tests to validate, but I expect the current implementation to be correct. See https://stackoverflow.com/posts/62076913/revisions

On Tue, Mar 26, 2024, 10:34 AM Jett @.***> wrote:

thinking about a little bit more, I'm 85% confident it doesn't work you're re-assigning a new expression to the loss variable, so the previous data including gradient is lost

— Reply to this email directly, view it on GitHub https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021068909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5BY5YM2OKWJOKQHHSIV3Y2GWTTAVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGA3DQOJQHE . You are receiving this because you authored the thread.Message ID: @.***>

jettjaniak commented 6 months ago

ha, good point!

On Tue, 26 Mar 2024 at 12:58, Jai Dhyani @.***> wrote:

But the loss variable never stored the gradient, right? The gradient is stored on the model tensors as a side effect of loss.backward.

We're definitely going to add tests to validate, but I expect the current implementation to be correct. See https://stackoverflow.com/posts/62076913/revisions

On Tue, Mar 26, 2024, 10:34 AM Jett @.***> wrote:

thinking about a little bit more, I'm 85% confident it doesn't work you're re-assigning a new expression to the loss variable, so the previous data including gradient is lost

— Reply to this email directly, view it on GitHub < https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021068909>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAC5BY5YM2OKWJOKQHHSIV3Y2GWTTAVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGA3DQOJQHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/delphi-suite/delphi/issues/79#issuecomment-2021351106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECKP3YDDZU6DOXRMUIOLJTY2HHN7AVCNFSM6AAAAABFIHOADSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRRGM2TCMJQGY . You are receiving this because you commented.Message ID: @.***>

jaidhyani commented 6 months ago

Factored out the gradient accumulation logic added tests for it in #89; confirmed that gradients do be accumulating.