Open yebangyu opened 1 year ago
Good question! For most neural networks we don’t use either of these kinds of stopping conditions. So it’s more an example of optimization than ideal way to train a neural network.The broader answer in the context of optimization problems is that both are valid with different trade offs. The zero check gets you closer to a local minima The difference between previous and current might stop before you reach a local minima, but it also might save you from waiting a very long time if the function is badly behaved and converging very slowly. I also just like the second approach because it’s more general purpose. Sometimes you want to check the convergence of something other than the gradient, which may not converge/minimize near zero. Sent from my iPhoneOn May 20, 2023, at 3:27 PM, yebangyu @.***> wrote: Dear Edward, From page 21 to page 23, when we are talking about auto grad, we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun my question is : why not just to test whether the grad of cur is zero or not ? that is to say : can while torch.linalg.norm(x_cur-x_prev) > epsilon: be replaced by epsilon = 1e-12 # an enough small value while abs(cur.grad) > epsilon: ? thanks a lot !
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
thanks for your reply, Edward
according to the sgd formula:
x_cur = x_prev - learning_rate * grad
if grad is close to zero , we can get that x_cur is approximately equal to x_prev but not vice versa
x_cur is approximately equal to x_prev does not mean that grad is close to zero (maybe just because learning rate is too small)
Am i right ?
Dear Edward,
From page 21 to page 23, when we are talking about auto grad,
we choose to test conditon ||prev - cur || < epsilon satisfies or not to check whether we have got the minimun
my question is : why not just to test whether the grad of cur is zero or not ?
that is to say :
can
while torch.linalg.norm(x_cur-x_prev) > epsilon:
be replaced by
epsilon = 1e-12 # an enough small value
while abs(cur.grad) > epsilon:
?
thanks a lot !