evaluate current nonlinear solvers capabilities

sdbond commented 4 years ago

Some things to check and comments when there are problems with the nonlinear least-squares solver ...

Is the convergence exhibiting a "zig zag" or "fluttering" path. This can happen when the problem is poorly conditioned. For some methods this results in a poor update direction.
Does the problem get better or worse when you increase the dimension of the ROM?
Is the solver stagnating in the inner linear solver iteration, or in the outer nonlinear optimization iteration?
Is the Jacobian rank-deficient or nearly rank-deficient at the solution, or somewhere along the path to the solution?
For rank-deficient and nearly rank-deficient problems it may be necessary to regularize the problem. The L-M method does this by adding a scaled identity matrix, but you can frequently do better if you know more about the null-space.
It is well-known that using the normal equations squares the condition number. So, if the least-squares problem has a condition number of 10^{8} then the normal equations will have a condition number 10^{16}, which is effectively singular in double-precision. If feasible, it would be great to compute the reduced SVD of the Jacobian matrix, which would provide a lot of information (e.g., condition number and null-space) and enable more robust solvers (e.g., pseudo-inverse). If this is not feasible, the QR factorization would be nice as well.

fnrizzi commented 3 years ago

@sdbond just to bring it up, when you have time after the things you are currently doing, can you please look into least squares solvers in the l1 norm? This is something both @eparish1 and @pjb236 brought up before and it might be useful for some problems. We can talk during the weekly meeting

sdbond commented 3 years ago

We could try using "iteratively reweighted least squares" which uses the residual to set the weight in the objective. For example if you have

Objective = |R_1|^2 + |R_2|^2 + ... + |R_n|^2

you have standard least squares. If you use

Objective = w_1 |R_1|^2 + w_2 |R_2|^2 + ... + w_n |R_n|^2

you have weighted least squares. If w_1 = 1/|R_1| and w_2 = 1/|R_2| ... then the objective is the L1 norm of the residual. So, the standard "iteratively reweighted least squares" algorithm just updates the weight at each step in the optimization. In some cases you may need to use w_i = 1/max{|R_i|,small number} to avoid division by zero.

eparish1 commented 3 years ago

@sdbond @fnrizzi For iteratively reweighed least squares, it is also cool that we aren't restricted to L1, but in general can just do an arbitrary "p norm", with p <= 1. This would be a nice feature. If we want to just do purely L1, it seems like there are more efficient approaches than iteratively reweighed least squares. Not sure if anyone has experience with this. But it seems that IRWLS would be the way to go, at least for now, due to it's flexibility in choice of norm as well as it's minimal modifications to GN/LM.

sdbond commented 3 years ago

Scipy has an implementation of IRWLS as part of its standard least squares solver. It minimizes C^2 rho(r1^2/C^2) + C^2 rho(r2^2/C^2) + C^2 rho(r3^2/C^2) + ... so by adjusting the rho function (called the loss function) and the C (called the f_scale) you can get may different variants. They have pre-defined loss functions or you can write your own. In particular, if you use 'soft_l1' you get the one that people usually use for L1. The huber loss function is also a popular choice for an approximate L1. For custom rho functions you provide functions that compute rho, first derivative of rho, and second derivative of rho.

fnrizzi commented 3 years ago

@sdbond thanks for this! I will take a look at what they have. I think one ideal next step from here would be to look at our code and design wht it would take to implement something similar to scipy. By "look" and "design" I mean figure out what pieces we can reuse of what we have and based on that, draft a tentative design to support the IRWLS. We can then review it and iterate until we are happy and then implement it.

sdbond commented 3 years ago

I wrote up some notes on how to make the Rosenbrock problem harder. Making the first mu parameter 10^8 will cause the current Pressio test to fail.

Pressio / pressio

evaluate current nonlinear solvers capabilities #219