Issue with very simple least-squares problem

filippobistaffa commented 4 years ago

I am trying to solve a very simple version of the least-squares problem in which the matrix A is obtained by stacking a copies of the identity matrix of size n. For the purposes of this exercise, y is just a vector with a copies of a random vector x of n elements, i.e., the trivial solution to the problem is just x.

The most simple version would be e.g.:

n = 3
a = 2
A = np.eye(n)
A = np.concatenate([A] * a, 0)
x = np.random.rand(A.shape[1])
y = A @ x
f = functions.norm_l2(y=y, A=A) 
solver = solvers.gradient_descent()
x0 = np.ones_like(x)
ret = solvers.solve([f], x0, solver, atol=1e-5, verbosity='HIGH')

Even in this very simple case, the solver does not converge to a solution. Indeed, the objective function increases until the maximum number of iterations is reached. Notice that np.linalg.lstsq(A, y) works just fine. Even in the extremely naive case with a = 1, the solver stops but fails to compute the obvious solution (x).

I also tried manually to define f as

f = functions.func()
f._eval = lambda xx: .5 * np.linalg.norm(A @ xx - y) ** 2
f._grad = lambda xx: A.T @ (A @ xx - y)

In this case, the solver correctly computes the obvious solution with a = 1, stops but returns an incorrect solution for a = 2, and does not converge after that.

What am I doing wrong?

mdeff commented 4 years ago

The optimization diverges because the gradient descent step is too large.

Gradient descent convergence and divergence.

From the documentation:

step : float
    The gradient-descent step-size. This parameter is bounded by 0 and
    :math:`\frac{2}{\beta}` where :math:`\beta` is the Lipschitz constant
    of the gradient of the smooth function (or a sum of smooth functions).
    Default is 1.

The following works:

n = 3
a = 2
A = np.eye(n)
A = np.concatenate([A] * a, 0)
x = np.random.rand(A.shape[1])
y = A @ x
f = functions.norm_l2(y=y, A=A) 
step = 1 / np.linalg.norm(A.T @ A)
solver = solvers.gradient_descent(step=step)
x0 = np.ones_like(x)
ret = solvers.solve([f], x0, solver, atol=1e-5, verbosity='HIGH')

filippobistaffa commented 4 years ago

Ok thanks for the clarification! Anyway, I also tried the forward_backward solver with fista_backtracking acceleration and it works out of the box.

filippobistaffa commented 4 years ago

Ok thanks for the clarification! Anyway, I also tried the forward_backward solver with fista_backtracking acceleration and it works out of the box.

I correct myself: forward_backward with fista_backtracking acceleration works out of the box only if I manually define f as

f = functions.func()
f._eval = lambda xx: .5 * np.linalg.norm(A @ xx - y) ** 2
f._grad = lambda xx: A.T @ (A @ xx - y)

but it does not compute the correct solution if I use f = functions.norm_l2(y=y, A=A). Why is that?

mdeff commented 4 years ago

Maybe because your function is not strictly equal to functions.norm_l2(y=y, A=A), which doesn't divide the norm by 2?

epfl-lts2 / pyunlocbox

Issue with very simple least-squares problem #32