Closed filippobistaffa closed 3 years ago
The optimization diverges because the gradient descent step is too large.
From the documentation:
step : float
The gradient-descent step-size. This parameter is bounded by 0 and
:math:`\frac{2}{\beta}` where :math:`\beta` is the Lipschitz constant
of the gradient of the smooth function (or a sum of smooth functions).
Default is 1.
The following works:
n = 3
a = 2
A = np.eye(n)
A = np.concatenate([A] * a, 0)
x = np.random.rand(A.shape[1])
y = A @ x
f = functions.norm_l2(y=y, A=A)
step = 1 / np.linalg.norm(A.T @ A)
solver = solvers.gradient_descent(step=step)
x0 = np.ones_like(x)
ret = solvers.solve([f], x0, solver, atol=1e-5, verbosity='HIGH')
Ok thanks for the clarification!
Anyway, I also tried the forward_backward
solver with fista_backtracking
acceleration and it works out of the box.
Ok thanks for the clarification! Anyway, I also tried the
forward_backward
solver withfista_backtracking
acceleration and it works out of the box.
I correct myself: forward_backward
with fista_backtracking
acceleration works out of the box only if I manually define f
as
f = functions.func()
f._eval = lambda xx: .5 * np.linalg.norm(A @ xx - y) ** 2
f._grad = lambda xx: A.T @ (A @ xx - y)
but it does not compute the correct solution if I use f = functions.norm_l2(y=y, A=A)
. Why is that?
Maybe because your function is not strictly equal to functions.norm_l2(y=y, A=A)
, which doesn't divide the norm by 2?
I am trying to solve a very simple version of the least-squares problem in which the matrix
A
is obtained by stackinga
copies of the identity matrix of sizen
. For the purposes of this exercise,y
is just a vector witha
copies of a random vectorx
ofn
elements, i.e., the trivial solution to the problem is justx
.The most simple version would be e.g.:
Even in this very simple case, the solver does not converge to a solution. Indeed, the objective function increases until the maximum number of iterations is reached. Notice that
np.linalg.lstsq(A, y)
works just fine. Even in the extremely naive case witha
= 1, the solver stops but fails to compute the obvious solution (x
).I also tried manually to define
f
asIn this case, the solver correctly computes the obvious solution with
a
= 1, stops but returns an incorrect solution fora
= 2, and does not converge after that.What am I doing wrong?