Is there a fresher reference for how HSD implementation currently works?

pratyai commented 2 months ago

Currently, the reference paper for Tulip is this AFAIK. However, I see some differences in the actual implementation, as follows:

The number of times the function KKT.solve!() is called is significantly different. There is a "preliminary" solve even before the predictor step to compute the quantity h0 (which is not explained anywhere, and I'm trying to work out its contribution).
Also, KKT.solve!() is called only once in each predictor or corrector step, whereas the paper says the augmented system needs to be solved twice in each solution of the newton step (predictor or corrector).

I looked a bit through the commit history and their messages are not particularly insightful on this matter either. I imagine that these differences were introduce organically during the development of Tulip.

Has there been any paper / documentation / publicly accessible communication for the differences or how the current implementation is formulated?

Thanks!

mtanneau commented 2 months ago

Thanks for using Tulip and for reaching out about this :)

In what follows, all the equations number refer to the arxiv preprint.

At the mathematical level, the implementation follows the formulation in the paper (Eq. (1)), except that the standard form used in the code considers explicit lower and upper bounds on variables (see this part of the docs)
The overall HSD algorithm is presented in Section 3.3, and the computation of the search direction is in Section 3.3.2. It consists of solving several Newton systems, which all have the same left-hand side matrix:
- affine scaling direction: (28)-(32)
- centrality correction: (35)-(39)
- additional centrality corrections: (45)-(49)
The discrepancy in the number of calls to KKT.solve! you're seeing comes from how the Newton systems are solved under the hood. This part is detailed in Section 4. Namely, we reduce the Newton system to a so-called augmented system (63).
- To solve system (63), we solve two smaller systems (64) and (65).
- As is mentioned in the paper just after Eq. (68), the system (64) is the same for all Newton systems, hence it only needs to be solved once per IPM iteration. This is what happens here. The quantities hx, hy in the code correspond to p, q in the paper's Eq. (64), and h0 corresponds to the denominator of Eq (66)
- Afterwards, we compute the affine-scaling direction here, the centrality correction here, and additional centrality corrections here. Each of these calls solve_newton_system! once, and you'll note that this function is passed hx, hy, h0 as arguments, which were obtained from the solution of (64).

The rationale behind most of this was that 1) the code was structured around a KKT module that solves augmented systems, and 2) I tried to minimize the number of redundant linear system solves.

pratyai commented 2 months ago

Thank you so much! Indeed the matter of the number of calls to KKT.solve! is quite clear after your explanation. I missed the fact that (64) is fixed for all the steps, and the preliminary step is computing just that. The hx, hy, h0 variables are also clear now.

ds4dm / Tulip.jl

Is there a fresher reference for how HSD implementation currently works? #148