Closed SohamTamba closed 5 years ago
The idea is that you want to move each pixel by at most step_size
in each iteration while maximizing the loss. In other words, you want to move along the gradient direction as much as possible without changing any pixel by more than step_size
. If you think about it, this corresponds exactly to moving by step_size * sign(grad)
(if the gradient of a pixel is positive you add step_size
, if it is negative you subtract). In the convex optimization literature this is known as _Linfinity-based gradient descent.
In principle, you could also just take steps along the gradient. However, we found that this takes much longer to converge and makes tuning the learning rate harder.
More of a question than an issue.
It can be infered from here the PGD steps along the sign of the gradient.
Is there any reason it does not simply step along the gradient? i.e.
x += gradient(x)*step_size
instead ofx += sign(gradient(x))*step_size
Thanks