a question about auto grad. thx

yebangyu commented 1 year ago

Dear Edward,

From page 21 to page 23, when we are talking about auto grad,

we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun

my question is : why not just to test whether the grad of cur is zero or not ?

that is to say :

can

while torch.linalg.norm(x_cur-x_prev) > epsilon:

be replaced by

epsilon = 1e-12 # an enough small value

while abs(cur.grad) > epsilon:

?

thanks a lot !

EdwardRaff commented 1 year ago

Good question! For most neural networks we don’t use either of these kinds of stopping conditions. So it’s more an example of optimization than ideal way to train a neural network.The broader answer in the context of optimization problems is that both are valid with different trade offs. The zero check gets you closer to a local minima The difference between previous and current might stop before you reach a local minima, but it also might save you from waiting a very long time if the function is badly behaved and converging very slowly. I also just like the second approach because it’s more general purpose. Sometimes you want to check the convergence of something other than the gradient, which may not converge/minimize near zero. Sent from my iPhoneOn May 20, 2023, at 3:27 PM, yebangyu @.***> wrote: Dear Edward, From page 21 to page 23, when we are talking about auto grad, we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun my question is : why not just to test whether the grad of cur is zero or not ? that is to say : can while torch.linalg.norm(x_cur-x_prev) > epsilon: be replaced by epsilon = 1e-12 # an enough small value while abs(cur.grad) > epsilon: ? thanks a lot !

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

yebangyu commented 1 year ago

thanks for your reply, Edward

according to the sgd formula:

x_cur = x_prev - learning_rate * grad

if grad is close to zero , we can get that x_cur is approximately equal to x_prev but not vice versa

x_cur is approximately equal to x_prev does not mean that grad is close to zero (maybe just because learning rate is too small)

Am i right ?

EdwardRaff / Inside-Deep-Learning

a question about auto grad. thx #11