Support lazy Jacobian and Jacobian transposes

mohamed82008 commented 3 years ago

Does MadNLP currently support using a lazy Jacobian and Jacobian transpose? It should be possible to use an iterative solver to solve the linear system which doesn't require forming the Jacobian matrix at all.

sshin23 commented 3 years ago

Thanks for the suggestion @mohamed82008. This may require a major change in the MadNLP code, though.

I believe this is something @frapac is very much interested 😊 Any thoughts @frapac?

@mohamed82008 Could you point me to a reported practical use case of iterative solver within interior point method, or anything similar? Also, can this method be used with some kind of preconditioner?

mohamed82008 commented 3 years ago

@mohamed82008 Could you point me to a reported practical use case of iterative solver within interior point method, or anything similar? Also, can this method be used with some kind of preconditioner?

I have a potential use case in topology optimisation where J v and J' v are cheaper than finding J. This combined with an l-BFGS approximation of the Hessian would be great to have.

frapac commented 3 years ago

To the best of my knowledge, most interior point methods for generic NLP require the full evaluation of the Jacobian. Even Knitro, which supports computing the descent direction with Hessian-vector products only and iterative linear algebra, needs the full Jacobian.

The reason is well explained in "A matrix-free augmented Lagrangian algorithm with application to large-scale structural design optimization" by S Arreckx, A Lambe, JRRA Martins, D Orban, Section 4 (NB: this is the article behind Percival.jl):

If the bounds are enforced by way of a logarithmic barrier, as in, e.g., IPOPT (W¨achter and Biegler, 2006) or KNITRO (Byrd et al., 2006), the subproblems are equality-constrained quadratic programs. KNITRO adds a trust-region constraint to those subproblems in order to guarantee global convergence and solves them by way of the projected conjugate-gradient method Gould et al. (2001). Unfortunately, the latter requires accurate projections into the nullspace of the linear equality constraints and this is best achieved if the Jacobian is explicitly available.

Finding a workaround is an open research question.

However, I think we could exploit the structure of your problem to avoid computing the full Jacobian J. With @sshin23 , we are currently working on a mixed Auglag-IPM algorithm that might be a solution for your problem too. I would be happy to discuss that further with you.

mohamed82008 commented 3 years ago

Sounds great! Happy to discuss this further.

frapac commented 3 years ago

At the moment, we are mostly working on problem with the following shape:

min_{x,p}       f(x, p)
  subject to    g(x, p) = 0 ,   h(x, p) <= 0,  p >= 0

where g encodes the physical constraints of the problem, and h the operational constraints. x is a state variable, that can be defined implicitly by the parameters p (or control) through the nonlinear coupling g.

Then, the idea is define x = x(p) as a functional depending on the parameters, to derive the following reduced problem:

min_{p}         f(x(p), p)
  subject to    h(x(p), p) <= 0,  p >= 0

easier to solve, as we reduce the overall dimension and it encompasses only inequality constraints. The reduced gradient and reduced Hessian can be computed efficiently using AD (see my talk at JuliaCon). By moving the constraints into the objective using a set of Augmented Lagrangian penalties, we get the following problem to solve:

min_{p, s}         f(x(p), p) + y' * (h(x(p), p) + s) + 1/2  \rho * | h(x(p), p) + s |^2 
  subject to       p >= 0, s >= 0

As for a fixed penalty \rho and multiplier y the problem is bound constrained, it can be resolved out of the box inside MadNLP (an active set method would be more appropriate, but we want to port the algorithm fully on the GPU and we are not sure an active set method will play well in that case).

Right now, we are implementing dense support for MadNLP, to factorize the dense reduced Hessian matrix on the GPU without too much data transfer. Then, we want to design a mixed Auglag-IPM algorithm, where we will update at each iteration the parameter rho, the multiplier y and the barrier term mu. This is still an ongoing work, though. But we believe that this method will play well to solve such reduced-space problems.

The framework is generic though, and we would like to extend it to other applications. We are currently applying the reduced-space algorithm to solve optimal power flow problem, but the method would apply to other domains as well: PDE-constrained optimization, topology optimization, optimal control and Differentiable Dynamic Programming ... The possibilities are endless :p

mohamed82008 commented 3 years ago

Then, we want to design a mixed Auglag-IPM algorithm, where we will update at each iteration the parameter rho, the multiplier y and the barrier term mu. This is still an ongoing work, though. But we believe that this method will play well to solve such reduced-space problems.

I think this can be directly hacked from the user's side at least in IPOPT. I thought about this a while ago. Basically instead of reporting the Hessian or gradient of the Lagrangian in IPOPT, you report those of the augmented Lagrangian. Run a few iterations then update rho and y using the standard AugLag update and use a primal-dual warm start in the next run. And this basically is a hybrid IPM-AugLag algorithm. It's something I can have in Nonconvex as a meta-algorithm to have augmented versions of Ipopt/MadNLP.

Whether we enforce the constraints directly by including the primal feasibility conditions or not, that's a separate question. Your proposed algorithm suggests we don't. My approach above allows that. I think having those in the IPM sub-solver is important to avoid violating the constraints too much which can regulate the optimization. However, the challenge is that the Jacobian can be expensive to compute and then factorize. When even computing the Jacobian is too expensive, the question becomes can we do better than AugLag? I think we can if J v and J' v are relatively inexpensive. Even if these operations are expensive, sometimes a reduced order model can be used to approximately compute these products, e.g. using multi-grid methods or dynamic mode decomposition. I haven't seen a lot of research combining reduced order models, AD and optimization algorithm design in this way so I think this is a great research direction to take.

MadNLP / MadNLP.jl

Support lazy Jacobian and Jacobian transposes #52