artofscience / SAOR

Sequential Approximate Optimization Repository
GNU General Public License v3.0
5 stars 1 forks source link

Implement dual solver #20

Open artofscience opened 3 years ago

artofscience commented 3 years ago

Assign: dirk

artofscience commented 2 years ago

Dirk has implemented several dual solvers for specific "commonly used" combinations of intervening variables (e.g. mma and conlin). Would be nice to add those to the framework as well to broaden the perspective and give fast solutions for specific problems (e.g. pure MMA)

dirkmunro89 commented 2 years ago

Not exactly sure about everything you say here; but, in general: dual solvers are perfectly elegant and efficient if you have a small number of constraints. Dont understand why this was not done / prioritized at the beginning.... There are good theoretical reasons for it too, particularly in combination with intervening variable frameworks.... this makes me very unsure about how all the intervening variables / approximations were implemented / integrated in the subsolves....

artofscience commented 2 years ago

@dirkmunro89 afaik solve the dual problems for specific intervening variables. in short; changing intervening variables means changing the solver (to some extent?). this means some loss of modularity in my view. however, would definitely been good to start with the dual solvers (also to check our pdip solver!). In the end we did not implement this so far as we knew you would be the best fit for this (hence the assignment to you). Would be nice to still have the dual implementations for some common use cases (slp, conlin, mma).

About "how all the intervening variables/approx are integrated in the subsolves": we first build the analytical approximation functions using the approximation and intervening_variable objects. Nex the subproblem by recursively calling the analytical approximated function and its derivatives. Hence, the subsolve is the same for all different subproblems.

dirkmunro89 commented 2 years ago

Stijn; the modularity comes from the general Taylor approximation, cast in QP form; this is the whole point; do you understand this? I am concerned. What you write / the distinction you draw doesnt make sense to me.... you have to explain to me how that subsolve works... I cant see how you are ensuring convexity / separability, and it almost sounds like you are missing the point of SAO...

dirkmunro89 commented 2 years ago

*the (dual or whatever) QP subsolve is then completely modular: this is the whole point of the approximated approximations framework.

aatmdelissen commented 2 years ago

The convexity and separability are ensured in the Approximation and Intervening objects. They implement the generic expression g(y(x)), for instance g(y) can be Taylor1 and y(x) can be MMA, respectively being Approximation and Intervening objects.

artofscience commented 2 years ago

*the (dual or whatever) QP subsolve is then completely modular: this is the whole point of the approximated approximations framework.

but this is only true for the approximated approximations (then you can use qp dual subsolve), for other approximations (e.g. MMA) you need an "mma-specific dual solve". E.g. from your library: you need a different solver per sao scheme; that is the approximation you choose and solver are linked, right? (e.g. you cannot make a conlin approx and solve with the MMA dual solver)

artofscience commented 2 years ago

Stijn; the modularity comes from the general Taylor approximation, cast in QP form; this is the whole point; do you understand this? I am concerned. What you write / the distinction you draw doesnt make sense to me.... you have to explain to me how that subsolve works... I cant see how you are ensuring convexity / separability, and it almost sounds like you are missing the point of SAO...

In short we use the SCPIP solver from Zillober, but then we wrote it ourselves, see here

dirkmunro89 commented 2 years ago

Ok cool; like I say, would just be cool to discuss a bit the details.... I understand that you have not had success with the intervening variables / approximations framework; I am just trying to figure out why... (because we are fairly confident that it works, from history, and I am doing examples right now in a simpler code, to sanity check again...)

As far as I know SCPIP relies on convexity and separability of the approximations, which is ensured by construction in the MMA approximations; yes? As far as I know, this is not easy to guarantee in general; it is guaranteed by construction in the CONLIN and MMA form, and it is very easy to guarantee via Taylor and approximated approximations in general....

On MMA and CONLIN; yes, in the traditional form, it was specifically built with the dual solve in mind. It was built on purpose so that it can be written into a sufficient dual solve. However, when you apply a primal-dual method to it, although you dont have to write down the analytical form of x as a function of \lambda (which is the key of the dual solve), you still have to ensure convexity somehow, by construction....

I find it strange: you are pushing for modularity, and very simplified constructions (LP + AML), but the most simple and general and modular one (QP + AML) is ignored or dealt with skepticism: 'BUT this is only true for approximated approximations': no; it is a fantastic generalization, making it true and simple for anything you wish to construct....

dirkmunro89 commented 2 years ago

And FYI: dont look too much at my subsolves in my repo; I have barely started; I am busy testing and verifying; I have not even implemented to proper general QP form.

aatmdelissen commented 2 years ago

I don't know much about the specifics of all different forms. But our intention is not to make a 'fool-proof' framework; any random combination of Approximation, Intervening variable, MoveLimit, and Subsolver might be cooked up, but there is no guarantee that each and every combination is going to work/converge, for the reasons you are mentioning. We merely provide the tools to experiment around in a structured way.

artofscience commented 2 years ago

@dirkmunro89,

Giannis and I put the following together.

The assumption is that m < n, such that a dual solver might be useful.

This is our view on the different approximation function forms and consequences for solvers choice (here iterative primal-dual vs. analytical dual).

Target: be one the same line wrt our views on the solver choice as a function of subproblem characteristics.

Let us consider the different forms that may be found in the literature:

First order Taylor wrt y: g_approx(x) = g0 + sum_i [ dg/dy_i*(y_i(x_i) - y0_i) ]

To be able to set up the dual solve one needs to find x_i(lambda) for all i. This is "easy" when y_i(x_i) is "simple and similar enough (e.g. same family of functions)" (for example reciprocal) for all i, then you can use dual solver. This will also be the case for Conlin, MMA etc.

Note, however, that in the general case, y_i(x_i) may differ per variable. In this generalized case, x_i(lambda) cannot be found exactly. This means, ALL x_i(lambda) must be able to find analytically, which depends on ALL intervening variables.

In our opinion thereto, for the general implementation, it does not make sense to have a dual solver for any (unknown) combination of y_i(x_i).

However, a dual solver implementation for the most common combinations (e.g. MMA) would be very useful to have (since many people will use this).

Second order diagonal taylor: g_approx(x) = g0 + sum_i[ dg/dx_i (x_i-x0_i) + c_i (x_i - x0_i)^2]

option1: c_i = 0.5 * d2g/dx2_i That is diagonal second order derivative is available. option2: c_i = function(some fit), for example Spherical(fitting previous point function value) and NonSpherical(fitting previous point gradient). option3: calculate c_i based on approximated approximations, that is c_i = second order derivative of your approximation function (e.g. first order taylor with MMA intervening variables).

Here you can use the dual solver (qp) because you know the format or use ipopt (probably a bit slower). The nice thing about this form is that you can adjust convexity using many different methods and STILL used the same dual solver (since the QP form is still the same).

Second order taylor wrt y: g_approx(x) = g0 + sum_i [ dg_i (y_i(x_i) - y0_i) + c_i (y_i(x_i) - y0_i)^2]

Finding c_i can be done by fitting the function to the previous point/previous point gradient (Spherical/Nonspherical approximations) Under these circumstances convexity is only proven for specific y_i(x_i) combinations (see incomplete series expansion paper by Groenwold). To us, combining both y_i(x_i) and c_i does not make sense in general, become cumbersome and convexity is not guaranteed. This form does not add more than any of the previous forms.

dirkmunro89 commented 2 years ago

Ok. Some remarks:

You see how the QP form allows you to write the x(lamba) in general, right? There is one subtlety about the QP form that I worry did not come across. In a first step, you make a quadratically constrained QP; but this thing is a little complicated (second order cone problem), and not easy to guarantee convexity of. In the second step, you throw the curvatures of the constraints in with the curvature of the objective approximation, while multiplying with the Lamba at the previous (current) solution point. In effect then, we are estimating the the curvature of the Lagrangian. Then, I have a QP (with linearised constraints), but the curvature information has been kept by putting it in the objective curvature / my QP represents the Lagrangian. Then, it is trivially separable (use only diagonal terms), and convexity is trivially enforced: just ensure that each curvature term is greater than some small positive number. Then I can write out the dual form in general, and apply a dual solve in a modular fashion: not caring how you came up with the curvature terms, only caring about their values. Are we on the same page here?

Lets leave the second order Taylor wrt y alone, for now; ok? Its a complicated extension / not well known. 2nd order Taylor wrt x --> cool and simple --> lets get to grips with this first.

"To be able to set up the dual solve one needs to find x_i(lambda) for all i. This is "easy" when y_i(x_i) is "simple and similar enough (e.g. same family of functions)" (for example reciprocal) for all i, then you can use dual solver." --> its a little bit more subtle than this; you see it clearly when you derive the CONLIN form, right? How the terms are of such a nature that you can throw them around / factorize in a particular way, so that you can write x(lamba). Play with it; you will see. Not necessarily 'simple' (MMA), or of a similar class (OC, like CONLIN, has a reciprocal function and a linear function).

"Note, however, that in the general case, y_i(x_i) may differ per variable. In this generalized case, x_i(lambda) cannot be found exactly. This means, ALL x_i(lambda) must be able to find analytically, which depends on ALL intervening variables." --> so not separable? Not a good idea in general; disadvantages far outweigh the advantages, as far as I know. Lets stick to the standard separable (equivalent to diagonal quadratic in QP) for now, OK? In fact, SAO is implied to be based on separable / diagonal subproblems; it has always been. So, I dont think this dependence on ALL intervening variables is value added.... there is reason why it is not done this way... and I dont think the severity of the complication is justified in terms of a HUGE improvement in convergent subproblems.

dirkmunro89 commented 2 years ago

This is the SAO philosophy: we devise subproblems which are EASY TO SOLVE. If the subproblem is too complicated (too nonlinear, and not separable, etc.) then you are essentially having to apply another NLP algorithm, just to solve the subproblem-->this defeats the purpose. Interpret it in the context of Nonlinear FE analysis and Newtons method: we cook it up so that we can solve the nonlinear physics, by solving a sequence of 'easy' / deterministic / input-output linear systems. Because the linear system is our 'computational kernel', and it has a single solution, that we know how to find. Its the same with these subproblems....

dirkmunro89 commented 2 years ago

A separable and convex subproblem is like a linear system in a nonlinear FE analysis loop---> it has one, easy to obtain, solution---> and we exploit this / we on purpose devise the subproblem so that this is true.

dirkmunro89 commented 2 years ago

Finally, to be clear / to complete the analogy: it would not make sense if we devise our Newton iterations (in an nonlinear FE analysis) such that each iteration itself requires multiple nonlinear solves... then we have made no progress in the way we have devised it.... right? We start by saying we need an algorithm for the nonlinear FE solve; and we devise a sequential scheme, the kernel of which itself requires a nonlinear solve... we have not solved the problem... right?

artofscience commented 2 years ago

Finally, to be clear / to complete the analogy: it would not make sense if we devise our Newton iterations (in an nonlinear FE analysis) such that each iteration itself requires multiple nonlinear solves... then we have made no progress in the way we have devised it.... right? We start by saying we need an algorithm for the nonlinear FE solve; and we devise a sequential scheme, the kernel of which itself requires a nonlinear solve... we have not solved the problem... right?

Really like the analogy :)

Understand and agree (up to some point). Indeed from an "elegance" point of view this does not make sense. However, if you know the original problem is "unsolvable without a sequential scheme", but you create subproblems that are "hard to solve, but certain to have a unique solution and you know you can certainly find this solution", then you still win in terms of robustness.

Lets say we have a nonlinear problem that is unsolvable without sequential scheme and we have two options:

  1. Solve it using very simple subproblems for which we have the analytical solution, but we know we would need many design iterates, or
  2. Solve it using mildly simple subproblems for which we do not have the analytical solution, but we know we can find the solution in a robust manner, and we know using such a scheme uses less design iterates, then using option2 might still be of beneficial use.
artofscience commented 2 years ago

You see how the QP form allows you to write the x(lamba) in general, right?

Yes.

There is one subtlety about the QP form that I worry did not come across. In a first step, you make a quadratically constrained QP; but this thing is a little complicated (second order cone problem), and not easy to guarantee convexity of. In the second step, you throw the curvatures of the constraints in with the curvature of the objective approximation, while multiplying with the Lamba at the previous (current) solution point. In effect then, we are estimating the the curvature of the Lagrangian. Then, I have a QP (with linearised constraints), but the curvature information has been kept by putting it in the objective curvature / my QP represents the Lagrangian. Then, it is trivially separable (use only diagonal terms), and convexity is trivially enforced: just ensure that each curvature term is greater than some small positive number. Then I can write out the dual form in general, and apply a dual solve in a modular fashion: not caring how you came up with the curvature terms, only caring about their values. Are we on the same page here?

Indeed. For me this I did not know. As far as I understand from the above text you build the QP problem to solve it in an "sqp-like" sense, similar to as written here; is that correct?

Do I understand correctly that for such "quadratic objective with linear constraints" subproblem there is analytical solution?

Lets leave the second order Taylor wrt y alone, for now; ok? Its a complicated extension / not well known. 2nd order Taylor wrt x --> cool and simple --> lets get to grips with this first.

Fully agree.

"Note, however, that in the general case, y_i(x_i) may differ per variable. In this generalized case, x_i(lambda) cannot be found exactly. This means, ALL x_i(lambda) must be able to find analytically, which depends on ALL intervening variables." --> so not separable? Not a good idea in general; disadvantages far outweigh the advantages, as far as I know. Lets stick to the standard separable (equivalent to diagonal quadratic in QP) for now, OK? In fact, SAO is implied to be based on separable / diagonal subproblems; it has always been. So, I dont think this dependence on ALL intervening variables is value added.... there is reason why it is not done this way... and I dont think the severity of the complication is justified in terms of a HUGE improvement in convergent subproblems.

I think our writing was not clear. We are certainly not of the intention to generate non-separable approximations. The point we want to make here is that our framework allows the user to combine any type of approximation and intervening variable(s) (thus also different intervening variables for different design variables, and different approximations and/or intervening variables per response function). For this general case, an analytical expression x_i(lambda) cannot be found. That is the reason we would opt to make the iterative primal-dual interior-point solver the default. If a user uses a set of approximations/intervening variables for which an efficient dual solver is available, we can either

  1. automatically detect this and select such solver
  2. leave it up to the user to select this solver (as opposed to the default)

@dirkmunro89 whats your view on this?

artofscience commented 2 years ago

Still not sure if we have the same thoughts about what kind of solver to use for which case.

Say we look "only" at

  1. g_approx(x) = g0 + sum_i [ dg/dy_i*(y_i(x_i) - y0_i) ], and
  2. g_approx(x) = g0 + sum_i[ dg/dx_i (x_i-x0_i) + c_i (x_i - x0_i)^2]

Assume for both cases the approximations are convex ( that is y_i(x_i) is convex and c_i > 0) and separable.

Let's first look at case 1.

For this case

  1. if y_i(x_i) = Linear for all i, then you can use linear programming. (but if stupid/lazy you can also solve it using iterative solver such as primal-dual interior point) In addition one can find the dual formulation?
  2. if y_i(x_i) is such that one can find x_i(lambda) analytically for all i, then a dual solver is preferred (but if stupid/lazy you can also solve it using iterative solver such as primal-dual interior point, which probably cost you extra computational effort)
  3. if y_i(x_i) is such that for any i one cannot find x_i(lambda), then an iterative solver such as primal-dual interior-point is the only option. (you could argue then you have build a "too difficult" subproblem, following your text on sao philosophy)

Would you, in general, agree on this?

Let's now look at case 2.

For this case we have a diagonal quadratic approximation for ALL responses. Then we have the following options

  1. Solve using a dedicated "sqp-like" solver that makes use of the structure of the approximation to find the solution analytically (referring to your previous text). Is this correct?
  2. Solve using an iterative scheme such as primal-dual interior point (probably unnecessary computational effort)

Would you, in general, agree on this?

Please correct me where wrong. :)

dirkmunro89 commented 2 years ago

I take it from back to front:

The last question; we have to think about. My feeling at the moment is that, the complication outweighs the potential benefit. Considering, for example, you modify these approximations anyway in convergent schemes (e.g. GCMMA). But lets think. I would like to see a case where that advantage is clearly demonstrated.

'Do I understand correctly that for such "quadratic objective with linear constraints" subproblem there is analytical solution?' ---> there is a general analytical form of x(lambda) ---> but beyond that I dont know what you mean by 'analytical solution'. Maybe this comment helps (also in the context of the nonlinear FE solve analogy): Why is a QP subproblem nice? Think about it: we are asking for the minimum of the QP subproblem; what does that mean? It means, we want to solve the LINEAR system you get when you set the derivative to 0; right? Hence, in optimization, the QP subproblem is equivalent to a linear solve in the nonlinear FE analysis--->the QP is an expression of a linear solve in the optimization world. (The only complication is dealing with inequality constraints, which is like contact.)

Now, the first reply: I disagree; that argument does not work so easily, purely from an epistemological standpoint. Lets say we know NOTHING; we start from scratch. The task is: devise a nonlinear solve (because we dont know how to do this). So, we devise a method, which still requires a nonlinear solve (the thing we dont have). It doesnt work; its a recursive argument.

dirkmunro89 commented 2 years ago

My reply was on your first two responses; have not addressed the third, yet.

dirkmunro89 commented 2 years ago

'My reply was on your first two responses; have not addressed the third, yet.'.

The 'last question' I addressed was

"....we can either

automatically detect this and select such solver
leave it up to the user to select this solver (as opposed to the default)

@dirkmunro89 whats your view on this?"

I respond to the one starting with 'Still not sure', now.

dirkmunro89 commented 2 years ago

I think the thoughts in the third response are not correct; the one that starts with 'Still not sure'. The advantages of the separable and convex QP form goes beyond the dual solve. Think deeply about the analogy with a linear solve in the process of doing a nonlinear FE solve.... What is more; if you are using an (unnecessarily expensive) general nonlinear programming algorithm as a subsolver, why not use the general nonlinear programming algorithm on the whole problem? It doesnt make sense, logically. Its also a bit like saying: you dont have to do sensitivity analysis, because you can always use finite differences (you know the person who makes that comment doesnt know what they are talking about...). Regardless, test it and show it: verify that what you describe is in fact equivalent (in your implementation), and measure the computational implications.... in the primal dual you are solving a (very sparse; due to separability!!!!!) problem of dimension m+n, i think? And in the dual solve its a problem of dimension m, with calls to cheap analytic representations.... test it....

I think the problem is: you have not understood this idea of a QP solve being the 'fundamental kernel'; because its the equivalent of a linear solve in the optimization world. With the MMA and CONLIN and reciprocal intervening variables in general, we are not solving a QP, but we have constructed the approximations such that the dual solve has a really easy job. Typically, you see the advantages of the things that were done for the dual solve to have an easy job, in a primal-dual solve as well. The relation between sparsity and separability, for example.

artofscience commented 2 years ago

The advantages of the separable and convex QP form goes beyond the dual solve.

Thanks for this, I did not know the advantages of separable and convex QP goes beyond dual solve. However, I do not understand from your text what is this advantage exactly?

Think deeply about the analogy with a linear solve in the process of doing a nonlinear FE solve.... What is more; if you are using an (unnecessarily expensive) general nonlinear programming algorithm as a subsolver, why not use the general nonlinear programming algorithm on the whole problem? It doesnt make sense, logically. Its also a bit like saying: you dont have to do sensitivity analysis, because you can always use finite differences (you know the person who makes that comment doesnt know what they are talking about...).

Agree.

Regardless, test it and show it: verify that what you describe is in fact equivalent (in your implementation), and measure the computational implications....

What are you referring to here with "that what you describe"? I do and did not mention iterative primal-dual interior point is "as fast" as the qp and/or dual.. or are you not referring to that?

in the primal dual you are solving a (very sparse; due to separability!!!!!) problem of dimension m+n, i think? And in the dual solve its a problem of dimension m, with calls to cheap analytic representations.... test it....

The primal-dual also only solves for dimension m (actually you can choose to solve for m or n, depending on which one is bigger), but you have to do this recursively until convergence which makes it expensive.

Still cannot filter out your response to whether you agree on the two cases and possible solvers..?

artofscience commented 2 years ago

Bit offtopic:

  1. what are the potential savings (in terms of computational effort) of using dual formulations as opposed to iterative primal-dual solver?
  2. how does this saving compare to the computational effort of the simulation (solving for the physics)

Now I remember; this was the reason we did not look into the dual solvers at first. We assumed the cost of the optimization is negligible compared to the cost of the simulation; hence any solver that gives you the unique solution of your separable and convex function will do. And thus the most widely applicable and robust may be preferred?

dirkmunro89 commented 2 years ago

'Still cannot filter out your response to whether you agree on the two cases and possible solvers..?' Because the distinctions you draw are not correct. For example 'sqp-like' vs 'iterative primal-dual' doesnt make sense at all; I dont understand what you are trying to say here. A QP solve is also iterative (if it has inequality constraints; linear solves are also iterative!), and a QP can be solved with a dedicated interior-point primal-dual QP(!) solver. Its 'orthogonal'. Have a look at for example the IBM CPLEX QP interior point primal dual, for example. (In know this solver well from SAND.)

I think it would be wise to build up more experience, before making such comments and decisions; yes?

dirkmunro89 commented 2 years ago

i.e.; lets keep it 'open', and test, and learn.

Can also look at it this way: if its a 'good' general nonlinear programming interior-point algorithm, then it will automatically 'detect' and solve a QP very quickly and easily. (In fact, I think, some of the sophisticated solvers detect which rows and columns in the m+n dimension matrix can be subbed out (which is what we do in a dual solve), as I think you suggest above. I think Zillober also describes this 'selection' of which space to solve the (sub)problem, in.)

dirkmunro89 commented 2 years ago

Analogy with nonlinear FE solve: I can activate the nonlin Newton procedure in Abaqus, but, if the physics is linear, then it should solve it in one step. (Of course, I have generated unnecessary overheads by not knowing beforehand that it is actually linear.)

Giannis1993 commented 2 years ago

Please correct me if I'm wrong here, but I think there is some miscommunication.

@artofscience The 'analytic' part of the dual solver refers to x_i=x_i(lambda), which is what allows you to solve only a linear system of dimension m. Then, by using the above (explicit) primal-dual relationship, you get the values of the other n variables. Moreover, for the case of inequality constraints, you must still use an iterative scheme of relaxed Newton iterations (as you do in the pdip) to get the values of lambda_j.

@dirkmunro89 The pdip solver that Svanberg used in his implementation from 1998 onwards (changed from a purely dual solver that he used in 1987 to that one) and Zillober discusses in his 2002 SCPIP paper can solve a linear system for either m or n variables depending on their relative size, and back-substitute to get the values of the remaining variables. Therefore, as far as I understand, such a solver can solve a convex and separable subproblem at least as efficiently as a purely dual one.

This is my understanding; please enlighten me if I have gotten something wrong.

Zillober2002_scpip.pdf Svanberg1998_mma.pdf

artofscience commented 2 years ago

Please correct me if I'm wrong here, but I think there is some miscommunication.

@artofscience The 'analytic' part of the dual solver refers to x_i=x_i(lambda), which is what allows you to solve only a linear system of dimension m. Then, by using the above (explicit) primal-dual relationship, you get the values of the other n variables. Moreover, for the case of inequality constraints, you must still use an iterative scheme of relaxed Newton iterations (as you do in the pdip) to get the values of lambda_j.

I think you are right :)

@dirkmunro89 The pdip solver that Svanberg used in his implementation from 1998 onwards (changed from a purely dual solver that he used in 1987 to that one) and Zillober discusses in his 2002 SCPIP paper can solve a linear system for either m or n variables depending on their relative size, and back-substitute to get the values of the remaining variables. Therefore, as far as I understand, such a solver can solve a convex and separable subproblem at least as efficiently as a purely dual one.

I think you are a bit wrong... but help me out if I correct you incorrectly. Indeed the system can be solved for m or n. However, where a pure dual solver only needs to solve once, the pdip solver solves for some error < tolerance, and then lowers the error again (called epsi in the code), hence this procedure is repeated many times, thus much more expensive.

dirkmunro89 commented 2 years ago

I think everything is cool.

The code is simply the best explanation, in a way.... If I have an analytic primal-dual expression, then I can simply and easily maximise a function of dimension m bounded variables. It is not linear in the sense that it is 'not iterative', no. The dual of Falk is a bit funky, it has discontinuous second order derivatives where the primal variables hit the bounds, for example.

I know the pdip solver is different, and I know how it works and I know Zillober yes etc. (check out my SAND work). To be honest, I simply dont know if that statement is true; I think it might be; but I dont know for sure. What I can say for sure is that it was simply impossible to do a dual QP solve of the SAND problem (too big M); and that I required a VERY sophisticated primal-dual QP solver (which has a lot of tricks built in) to handle it (IBM CPLEX). You need an academic license for this solver; it is a VERY expensive solver. Freely available QP solvers simply couldnt handle it, because it didnt have the sophisticated tricks. So my feeling is yes, in theory, however, in a practical open-source free software world, maybe not. But maybe things have developed.

Very simply; we implement a simple dual solve as an option, and we can test it when we please....

Right?

Giannis1993 commented 2 years ago

I think everything is cool.

The code is simply the best explanation, in a way.... If I have an analytic primal-dual expression, then I can simply and easily maximise a function of dimension m bounded variables. It is not linear in the sense that it is 'not iterative', no. The dual of Falk is a bit funky, it has discontinuous second order derivatives where the primal variables hit the bounds, for example.

I know the pdip solver is different, and I know how it works and I know Zillober yes etc. (check out my SAND work). To be honest, I simply dont know if that statement is true; I think it might be; but I dont know for sure. What I can say for sure is that it was simply impossible to do a dual QP solve of the SAND problem (too big M); and that I required a VERY sophisticated primal-dual QP solver (which has a lot of tricks built in) to handle it (IBM CPLEX). You need an academic license for this solver; it is a VERY expensive solver. Freely available QP solvers simply couldnt handle it, because it didnt have the sophisticated tricks. So my feeling is yes, in theory, however, in a practical open-source free software world, maybe not. But maybe things have developed.

Very simply; we implement a simple dual solve as an option, and we can test it when we please....

Right?

Sounds good.

dirkmunro89 commented 2 years ago

We have to think about an elegant way to do it. a dual QP solve is generic and modular; so this is easy, but only in general valid in the approx approx frame. For intervening vars, a simple first step might be, to have a dedicated little dual subsolver for conlin and mma separately... which the user has to specify, knowing that the intervening vars have to make sense in that context... I dont know...

what do you think?

dirkmunro89 commented 2 years ago

For example, if the user specifies: I want the MMA, or I want CONLIN; then they can also specify the use of a dual subsolve.

artofscience commented 2 years ago

We have to think about an elegant way to do it. a dual QP solve is generic and modular; so this is easy, but only in general valid in the approx approx frame. For intervening vars, a simple first step might be, to have a dedicated little dual subsolver for conlin and mma separately... which the use has to specify, knowing that the intervening vars have to make sense in that context... I dont know...

what do you think?

Exactly this :) Then we allow the user to do anything: MMA + dual solve, MMA + ipopt etc. We just provide the ingredients

artofscience commented 2 years ago

@dirkmunro89 can you guide me to your implementation(s) of the conlin and MMA dual solvers, so I can try to implement them in our framework. Then next we can do a verification to your code (and ipopt)?

dirkmunro89 commented 2 years ago

Yeah cool. CONLIN is sub_con.py in the sub folder. sub_mma.py is the first version of the MMA that svanberg introduces in 1987 paper, in terms of the asymptote scaling factors. sub_mma.py is w/o the constraint relaxation scheme. sub_mma_rel.py is the one with the constraint relaxation scheme. The former I am fairly confident that it is correct, the latter also, but a little bit less. I think you can pretty much copy past most of it... likely just some minor changes to make it fit your interface...? Feel free to ask any other details... I can likely add comments... I do so later today.

dirkmunro89 commented 2 years ago

*comments which say what variable means what, at entry and at exit

dirkmunro89 commented 2 years ago

but I think you can see clearly... even without comments...