eparish1 commented 4 years ago

6

We need a new Gauss-Newton solver for an efficient WLS implementation. The solver should be the same as the current GN solver with the following differences:

1.) Instead of having routines to compute J, r, J^TJ, and J^TR, we have one single routine that computes J^TJ and J^TR; i.e., the ``system" should have a member

JTJ_and_JTR(state_t & state, JTJ_type & JTJ, JTR_type & JTR)

The reason for this is so that we can compute these products efficiently for different systems.

2.) It is important to avoid computing anything that stores an entire ""windowed" FOM vector (i.e. the residual over the window).

The sample code would look like

namespace pressio{ namespace solvers{ namespace iterative{ namespace impl{
template <
  typename line_search_t,
  typename converged_when_tag,
  typename system_t,
  typename lin_solver_t,
  typename iteration_t,
  typename scalar_t,
  typename observer_t = utils::impl::empty
  >
void gauss_newton_neq_solve_general(const system_t & sys,
          typename system_t::state_type & y,
          typename system_t::state_type & ytrial,
          typename system_t::JTR_type & JTR, // this should be the same as state_type
          typename system_t::JTJ_type & JTJ,
          typename system_t::state_type & dy,
          lin_solver_t & linSolver,
          iteration_t maxNonLIt,
          scalar_t tolerance,
          scalar_t & normO,
          scalar_t & norm_dy,
          const observer_t * observer,
          std::string & convCondDescr)
{

  using residual_t  = typename system_t::residual_type;
  using jacobian_t  = typename system_t::jacobian_type;

  // find out which norm to use
  using norm_t = typename NormSelectorHelper<converged_when_tag>::norm_t;

  // policy for evaluating the norm of a vector
  using norm_evaluator_t = ComputeNormHelper<norm_t>;

  // policy to checking convergence
  using is_converged_t = IsConvergedHelper<converged_when_tag>;

  /* policy for computing line search factor (alpha) such that
   * the update is done with y = y + alpha dy
   * alpha = 1 default when user does not want line search
   */
  using lsearch_helper_t = LineSearchHelper<line_search_t>;

  //-------------------------------------------------------

  // alpha for taking steps
  scalar_t alpha = {};
  // storing residual norm
  scalar_t normRes = {};
  scalar_t normRes0 = {};
  // storing projected residual norm
  scalar_t normJTRes = {};
  scalar_t normJTRes0 = {};

  constexpr auto one = ::pressio::utils::constants::one<scalar_t>();
  convCondDescr = std::string(is_converged_t::description_);

  // compute the initial norm of y (the state)
  norm_evaluator_t::evaluate(y, normO);
  norm_dy = {0};

  iteration_t iStep = 0;
  while (++iStep <= maxNonLIt)
  {

    // compute everything in one go
    sys.JTJ_and_JTR(y, JTJ,JTR);

    JTR.scale(static_cast<scalar_t>(-1));
    // projected residual norm for current state
    norm_evaluator_t::evaluate(JTR, normJTRes);

    // store initial residual norm
    if (iStep==1) normJTRes0 = normJTRes;

    // solve normal equations
    linSolver.solveAllowMatOverwrite(JTJ, JTR, dy);

    // compute norm of the correction
    norm_evaluator_t::evaluate(dy, norm_dy);

    // exit with error if NaNs detected in solution update dy
    if (std::isnan(norm_dy))
    {
      throw std::runtime_error(
        "Nonlinear solver: NEQ-based Gausss Newton: NaNs detected in solution update dy");
    }

    // compute multiplicative factor if needed
    // Eric's comment: not sure about how this is done in the code currently, 
    // but we only want JTJ and JTR as inputs. The line search would then require ||JTR|| to 
    // decrease
    lsearch_helper_t::evaluate(alpha, y, ytrial, dy,JTR,JTJ, sys);

    // solution update: y = y + alpha*dy
    ::pressio::containers::ops::do_update(y, one, dy, alpha);

    // check convergence (whatever method user decided)
   const auto flag = is_converged_t::evaluate(y, dy,
                 norm_dy,normJTRes, normJTRes0,
                 iStep, maxNonLIt, tolerance);

    // store new norm into old variable
    normO = norm_dy;

  }//loop

}// end

}}}} //end namespace pressio::solvers::iterative::impl
#endif

fnrizzi commented 4 years ago

@eparish1 A few things to clarify: [1] what about we call/refer to:

J^T J: hessian (since it is an approximation of it)
J^T R: projected residual (maybe you have a better name)?

[2] in view of 1 above, the target system type is admissible if it meets the following conditions: it has these typedefs:

using scalar_type = ...;
using state_type = ...;
using hessian_type = ...;
using projected_residual_type = ...;

and these methods:

void computeHessianAndProjectedResidual(const state_t & state, 
                                        hessian_type & JTJ, 
                                        proj_residual_type & JTR) const;

hessian_type createHessianObject(const state_t & state) const{
  // create and return an empty object, does not have to contain any real data in it. 
  // It is just called to initialized members of the solver class, so only called once. 
}

projected_residual_type createProjectedResidualObject(const state_t & state) const{
  // create and return an empty object, does not have to contain any real data in it. 
  // It is just called to initialized members of the solver class, so only called once. 
}

[3] for the line search, we need to talk more, since some methods need the residual. What if we wanted to compute just the residual (or J^T R)? in that case, we should make the system method support that too. Maybe using another argument to the above method, or have a separate residual method? Basically forcing the system class to either compute J^T T and J^T R together or just R (or J^T R).

mhoemmen commented 4 years ago

Regarding converged_when_tag, does it actually pay for this to be a compile-time choice? Why not make it a run-tiime choice?

fnrizzi commented 4 years ago

Regarding converged_when_tag, does it actually pay for this to be a compile-time choice? Why not make it a run-tiime choice?

@mhoemmen Good point! Right now, I made this choice so one would do:

using converged_when_t = solvers::iterative::converged_when::completingNumMaxIters;
using gn_t = solvers::iterative::GaussNewton<linear_solver_t, converged_when_t, problem_t, hessian_t>;
gn_t GNSolver(problem, x, linSolver);
// problem is the problem object, x is the initial state here, linSolver is the linear solver object

One caveat (which I think is what you thought about) is that this implies a fixed convergence method for a solver object, so if you to vary the convergence condition, you would need to create a different solver object. I agree this might be annoying. However, this choice was just a bit more convenient (for now) to avoid having to pass the convergence condition through the various layers needed to construct a ROM problem to make it a run-time choice that is queried when the actual solve happens. But I agree that it would be ideal (and at some point should) to support both options.

The other thing I want to add is that currently one can only choose among a few options for the convergence criterion specified within pressio. At some point, this should be made more flexible by allowing the user to pass a callable that sets the convergence condition and pressio performs a compile-time analysis of the callable type to make sure this meets the right API. This kind of flexibility was not super urgent (at least so far) so that is why this is missing but I would love for this to be done properly at some point :)

eparish1 commented 4 years ago

[1] For the naming convention, I think they are both ok. I personally don't mind JTJ and JTR because I think they are descriptive (but that's just me). Calling JTJ the Hessian is definitely reasonable (especially since that's what you currently have). For JTR, I don't love projected_residual as J^TR is not really a projected residual. To be consistent with Hessian, I would call J^T R the gradient.

[2] I agree with this, although is there a reason that we are doing a "createHessianObject" rather than having a "void" and a "return" equivalent of "computeHessianAndProjectedResidaul"?

[3] Yes, this is a good point with the line search. It would be useful to have a method that computes the residual norm (not the whole residual). It is a little annoying to have a method that is required only for the line search, however. Maybe we can't really get around it though.

fnrizzi commented 4 years ago

[1] For the naming convention, I think they are both ok. I personally don't mind JTJ and JTR because I think they are descriptive (but that's just me). Calling JTJ the Hessian is definitely reasonable (especially since that's what you currently have). For JTR, I don't love projected_residual as J^TR is not really a projected residual. To be consistent with Hessian, I would call J^T R the gradient.

Ok, I like the gradient... :) The reason I don't like JTJ and JTR is that these are just letters. If somebody has a formulation that relies on denoting things differently, e.g. \zeta for the residual, then we are off (probably that will never happen, but just in case). I would argue that for generality and expressiveness, we should use names that convey the concept/meaning of things not letter. For the same reason, we don't call computeResidual something like computeR. So hessian and gradient are nice. I would even say approximate hessian and gradient, what do you think of that?

[2] I agree with this, although is there a reason that we are doing a "createHessianObject" rather than having a "void" and a "return" equivalent of "computeHessianAndProjectedResidaul"?

Yes, to follow along the residual API, and to clearly tell any user (or even ourselves) that the methods create... are just meant to create an object with no useful data in it and convey the idea that these will only be called once. Makes sense?

[3] Yes, this is a good point with the line search. It would be useful to have a method that computes the residual norm (not the whole residual). It is a little annoying to have a method that is required only for the line search, however. Maybe we can't really get around it though.

I agree. So you think just a method for the norm?

fnrizzi commented 4 years ago

using scalar_type = ...;
using state_type = ...;
using hessian_type = ...;
using gradient_type = ...; 
//...
void computeHessianAndGradient(const state_t & state, 
                               hessian_type & JTJ, 
                               gradient_type & JTR,
                               const ::pressio::solvers::Norm & normType, 
                               scalar_type & residualNorm) const;

hessian_type createHessianObject(const state_t & state) const{
  // create and return an empty object, does not have to contain any real data in it. 
  // It is just called to initialized members of the solver class, so only called once. 
}

gradient_type createGradientObject(const state_t & state) const{
  // create and return an empty object, does not have to contain any real data in it. 
  // It is just called to initialized members of the solver class, so only called once. 
}

mhoemmen commented 4 years ago

However, this choice was just a bit more convenient (for now) to avoid having to pass the convergence condition through the various layers needed to construct a ROM problem to make it a run-time choice that is queried when the actual solve happens.

Designing convergence tests and nested solvers is tricky. You might think you're sufficiently general, but then some new kind of solver comes along that breaks your interface assumptions.

In general, convergence criteria need run-time information. For example, users might want to supply a custom norm or inner product (ROL has this issue). An outer solver might want to control behavior of successive inner solves based on previous inner solves' results (e.g., with inexact Krylov methods, or Newton-Krylov). Users might want to combine convergence criteria (e.g., Belos' GMRES wants both the implicitly computed and explicitly computed residual norms to satisfy the tolerance).

fnrizzi commented 4 years ago

@mhoemmen I agree with you! I think what we have now can be made more general and making it also a run-time option seems good to have. Btw, now that I am thinking about it, I think the current design does not prevent us from accessing run-time information, it just forces the user to choose the criterion upfront (which I agree it might not be ideal).

eparish1 commented 4 years ago

If we want to be able to integrate the Wolfe conditions, we should support a method for computing: r(x_1)^T J(x_2), where r is the residual, J is the Jacobian, and x_1, x_2 are two instances of the optimization variables (e.g., the optimization variable at the current step and the guess for the next iteration). This leads to storing a vector in memory that scales with the size of the optimization variable, which is doable. So I'd say we could have two additional methods:

1.) residual_norm(x) 2.) jacobianTransposeResid(x_1,x_2)

Obviously I think it is important to not require both of these for the solver, but only for the line search (e.g., the solver will run if none are provided, but can't use the line search). Just a side note, we should be careful with how the line search is implemented for the case of hyper reduction and different norms, as these then change the Wolfe conditions.

Pressio / pressio

New Gauss-Newton Solver #45

6