Closed eparish1 closed 3 years ago
@eparish1 A few things to clarify: [1] what about we call/refer to:
J^T J
: hessian (since it is an approximation of it) J^T R
: projected residual (maybe you have a better name)? [2] in view of 1 above, the target system type is admissible if it meets the following conditions: it has these typedefs:
using scalar_type = ...;
using state_type = ...;
using hessian_type = ...;
using projected_residual_type = ...;
and these methods:
void computeHessianAndProjectedResidual(const state_t & state,
hessian_type & JTJ,
proj_residual_type & JTR) const;
hessian_type createHessianObject(const state_t & state) const{
// create and return an empty object, does not have to contain any real data in it.
// It is just called to initialized members of the solver class, so only called once.
}
projected_residual_type createProjectedResidualObject(const state_t & state) const{
// create and return an empty object, does not have to contain any real data in it.
// It is just called to initialized members of the solver class, so only called once.
}
[3] for the line search, we need to talk more, since some methods need the residual. What if we wanted to compute just the residual (or J^T R
)? in that case, we should make the system method support that too. Maybe using another argument to the above method, or have a separate residual method? Basically forcing the system class to either compute J^T T
and J^T R
together or just R
(or J^T R
).
Regarding converged_when_tag
, does it actually pay for this to be a compile-time choice? Why not make it a run-tiime choice?
Regarding
converged_when_tag
, does it actually pay for this to be a compile-time choice? Why not make it a run-tiime choice?
@mhoemmen Good point! Right now, I made this choice so one would do:
using converged_when_t = solvers::iterative::converged_when::completingNumMaxIters;
using gn_t = solvers::iterative::GaussNewton<linear_solver_t, converged_when_t, problem_t, hessian_t>;
gn_t GNSolver(problem, x, linSolver);
// problem is the problem object, x is the initial state here, linSolver is the linear solver object
One caveat (which I think is what you thought about) is that this implies a fixed convergence method for a solver object, so if you to vary the convergence condition, you would need to create a different solver object. I agree this might be annoying. However, this choice was just a bit more convenient (for now) to avoid having to pass the convergence condition through the various layers needed to construct a ROM problem to make it a run-time choice that is queried when the actual solve happens. But I agree that it would be ideal (and at some point should) to support both options.
The other thing I want to add is that currently one can only choose among a few options for the convergence criterion specified within pressio. At some point, this should be made more flexible by allowing the user to pass a callable that sets the convergence condition and pressio performs a compile-time analysis of the callable type to make sure this meets the right API. This kind of flexibility was not super urgent (at least so far) so that is why this is missing but I would love for this to be done properly at some point :)
[1] For the naming convention, I think they are both ok. I personally don't mind JTJ and JTR because I think they are descriptive (but that's just me). Calling JTJ the Hessian is definitely reasonable (especially since that's what you currently have). For JTR, I don't love projected_residual as J^TR is not really a projected residual. To be consistent with Hessian, I would call J^T R the gradient.
[2] I agree with this, although is there a reason that we are doing a "createHessianObject" rather than having a "void" and a "return" equivalent of "computeHessianAndProjectedResidaul"?
[3] Yes, this is a good point with the line search. It would be useful to have a method that computes the residual norm (not the whole residual). It is a little annoying to have a method that is required only for the line search, however. Maybe we can't really get around it though.
[1] For the naming convention, I think they are both ok. I personally don't mind JTJ and JTR because I think they are descriptive (but that's just me). Calling JTJ the Hessian is definitely reasonable (especially since that's what you currently have). For JTR, I don't love projected_residual as J^TR is not really a projected residual. To be consistent with Hessian, I would call J^T R the gradient.
Ok, I like the gradient... :) The reason I don't like
JTJ
andJTR
is that these are just letters. If somebody has a formulation that relies on denoting things differently, e.g.\zeta
for the residual, then we are off (probably that will never happen, but just in case). I would argue that for generality and expressiveness, we should use names that convey the concept/meaning of things not letter. For the same reason, we don't callcomputeResidual
something likecomputeR
. So hessian and gradient are nice. I would even say approximate hessian and gradient, what do you think of that?[2] I agree with this, although is there a reason that we are doing a "createHessianObject" rather than having a "void" and a "return" equivalent of "computeHessianAndProjectedResidaul"?
Yes, to follow along the residual API, and to clearly tell any user (or even ourselves) that the methods
create...
are just meant to create an object with no useful data in it and convey the idea that these will only be called once. Makes sense?[3] Yes, this is a good point with the line search. It would be useful to have a method that computes the residual norm (not the whole residual). It is a little annoying to have a method that is required only for the line search, however. Maybe we can't really get around it though.
I agree. So you think just a method for the norm?
using scalar_type = ...;
using state_type = ...;
using hessian_type = ...;
using gradient_type = ...;
//...
void computeHessianAndGradient(const state_t & state,
hessian_type & JTJ,
gradient_type & JTR,
const ::pressio::solvers::Norm & normType,
scalar_type & residualNorm) const;
hessian_type createHessianObject(const state_t & state) const{
// create and return an empty object, does not have to contain any real data in it.
// It is just called to initialized members of the solver class, so only called once.
}
gradient_type createGradientObject(const state_t & state) const{
// create and return an empty object, does not have to contain any real data in it.
// It is just called to initialized members of the solver class, so only called once.
}
However, this choice was just a bit more convenient (for now) to avoid having to pass the convergence condition through the various layers needed to construct a ROM problem to make it a run-time choice that is queried when the actual solve happens.
Designing convergence tests and nested solvers is tricky. You might think you're sufficiently general, but then some new kind of solver comes along that breaks your interface assumptions.
In general, convergence criteria need run-time information. For example, users might want to supply a custom norm or inner product (ROL has this issue). An outer solver might want to control behavior of successive inner solves based on previous inner solves' results (e.g., with inexact Krylov methods, or Newton-Krylov). Users might want to combine convergence criteria (e.g., Belos' GMRES wants both the implicitly computed and explicitly computed residual norms to satisfy the tolerance).
@mhoemmen I agree with you! I think what we have now can be made more general and making it also a run-time option seems good to have. Btw, now that I am thinking about it, I think the current design does not prevent us from accessing run-time information, it just forces the user to choose the criterion upfront (which I agree it might not be ideal).
If we want to be able to integrate the Wolfe conditions, we should support a method for computing: r(x_1)^T J(x_2)
, where r
is the residual, J
is the Jacobian, and x_1
, x_2
are two instances of the optimization variables (e.g., the optimization variable at the current step and the guess for the next iteration). This leads to storing a vector in memory that scales with the size of the optimization variable, which is doable. So I'd say we could have two additional methods:
1.) residual_norm(x) 2.) jacobianTransposeResid(x_1,x_2)
Obviously I think it is important to not require both of these for the solver, but only for the line search (e.g., the solver will run if none are provided, but can't use the line search). Just a side note, we should be careful with how the line search is implemented for the case of hyper reduction and different norms, as these then change the Wolfe conditions.
6
We need a new Gauss-Newton solver for an efficient WLS implementation. The solver should be the same as the current GN solver with the following differences:
1.) Instead of having routines to compute J, r, J^TJ, and J^TR, we have one single routine that computes J^TJ and J^TR; i.e., the ``system" should have a member
The reason for this is so that we can compute these products efficiently for different systems.
2.) It is important to avoid computing anything that stores an entire ""windowed" FOM vector (i.e. the residual over the window).
The sample code would look like