Closed fnrizzi closed 2 years ago
@sdbond just to bring it up, when you have time after the things you are currently doing, can you please look into least squares solvers in the l1 norm? This is something both @eparish1 and @pjb236 brought up before and it might be useful for some problems. We can talk during the weekly meeting
We could try using "iteratively reweighted least squares" which uses the residual to set the weight in the objective. For example if you have
you have standard least squares. If you use
you have weighted least squares. If w_1 = 1/|R_1| and w_2 = 1/|R_2| ... then the objective is the L1 norm of the residual. So, the standard "iteratively reweighted least squares" algorithm just updates the weight at each step in the optimization. In some cases you may need to use w_i = 1/max{|R_i|,small number} to avoid division by zero.
@sdbond @fnrizzi For iteratively reweighed least squares, it is also cool that we aren't restricted to L1, but in general can just do an arbitrary "p norm", with p <= 1. This would be a nice feature. If we want to just do purely L1, it seems like there are more efficient approaches than iteratively reweighed least squares. Not sure if anyone has experience with this. But it seems that IRWLS would be the way to go, at least for now, due to it's flexibility in choice of norm as well as it's minimal modifications to GN/LM.
Scipy has an implementation of IRWLS as part of its standard least squares solver. It minimizes C^2 rho(r1^2/C^2) + C^2 rho(r2^2/C^2) + C^2 rho(r3^2/C^2) + ... so by adjusting the rho function (called the loss function) and the C (called the f_scale) you can get may different variants. They have pre-defined loss functions or you can write your own. In particular, if you use 'soft_l1' you get the one that people usually use for L1. The huber loss function is also a popular choice for an approximate L1. For custom rho functions you provide functions that compute rho, first derivative of rho, and second derivative of rho.
@sdbond thanks for this! I will take a look at what they have. I think one ideal next step from here would be to look at our code and design wht it would take to implement something similar to scipy. By "look" and "design" I mean figure out what pieces we can reuse of what we have and based on that, draft a tentative design to support the IRWLS. We can then review it and iterate until we are happy and then implement it.
I wrote up some notes on how to make the Rosenbrock problem harder. Making the first mu parameter 10^8 will cause the current Pressio test to fail.
Some things to check and comments when there are problems with the nonlinear least-squares solver ...