Course of action - Githubissues

antonior92 commented 7 years ago

Ok, so I did some research today and decided to take a bottom-up approach in implementing the algorithm. I intend to follow the sequence:

[x] 1. Implement the projected CG;
[x] 2. Implement the Trust-Region QP problem with Relaxation Method;
[x] 3. Implement SQP Trust-Region method for the barrier problem;
[x] 4. Implement nonlinear (Interior point method) problem as a series of barrier problems.
[x] 5. Integrate solver to scipy.
[x] 6. Implement quasi-Newton options.

Each step uses the previous one as a substep and I will try to create layers of abstractions in order to better cope with that. I think each one of the steps 2, 3 and 4 should take approximately a week to implement.

I already created a wiki for 2. "Implement the Trust-Region QP problem with Relaxation Method".

What do the two of you think about this plan?

nmayorov commented 7 years ago

I have some questions based on the current wiki description.

What is the nature of lb and ub? Are these just initial constraints of the problem which are considered separately because of their simplicity?
What is the justification and purpose of turning the problem from infeasible to feasible?
Am I correct that the 2 additions you will be making are: 1) Considering bound constraints. 2) Considering the relaxation approach to make the problem feasible.

I will read the references your provided to get a better idea, but now what to quickly understand what is going on.

antonior92 commented 7 years ago

What I am calling x include slack variables and the presence of lower bounds lb happens naturally because of these slack variables. I decided to formulated in the more general setting lb<=x<=ub because it doesn't make much difference in terms of difficulty of implementation, but I think it might be useful in the next steps (maybe as a way to deal with bound constraints more efficiently).
The basic idea is that the linear constraints may not be compatible with the trust-region constraint, as illustrated on the figure bellow:

screenshot 2017-06-09 13 52 38

These is troublesome, because these infeasible problems arise naturally during the optimisation algorithm. Nocedal/Wright (p. 546) explanation for how to proceed in this case:

To resolve the possible conflict between the linear constraints and the trust-region constraint, it is not appropriate simply to increase the trust-region until the set of steps p satisfying the linear constraints intersects the trust region. This approach would defeat the purpose of using the trust region in the first place as a way to define a region within which we trust the model to accurately reflect the behavior of the objective and constraint functions. Analytically, it would harm the convergence properties of the algorithm.

A more appropriate viewpoint is that there is no reason to satisfy the linearized constraints exactly at every step; rather, we should aim to improve the feasibility of these constraints at each step and to satisfy them exactly only if the trust-region constraint permits it.

The approach I am adopting to solve this problem is the one from the original paper describing the method and it guarantees the global convergence of the trust-region algorithm.

Exactly.

antonior92 commented 7 years ago

Just posted the next implementation steps on wiki.

I also included some commented references I am using for the implementation.

mdhaber commented 7 years ago

Thank you! The plan looks solid, as usual. Is there anything in the references that you expect issues with? What can we do to help? Matt

On Mon, Jun 19, 2017 at 12:50 PM, Antonio Horta Ribeiro < notifications@github.com> wrote:

Just posted the next implementations steps on wiki.

I also included some commented references I am using for the implementation.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/4#issuecomment-309553390, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK3ST5FO4sO3h2b9toiB9h9xByeYBks5sFtDrgaJpZM4N1q-a .

antonior92 commented 7 years ago

Thanks Matt! I will report any problem and ask for suggestions along the implementation. But I don't see any potential issue for now. I am starting the implementation today and I will see how things progress.

antonior92 commented 7 years ago

I just corrected the bug Nikolay pointed out to me. There are still a lot of things I would still like to do regarding the SQP solver. For instance I wanted to implement more problems (including the ones Matt suggested me, that I didn't had time to do yet).

But, nevertheless, I think my implementation is robust enough and that I have tested it sufficiently. That way I intend to move on to the main solver. I will present an implementation plan about this last part tomorrow.

antonior92 commented 7 years ago

Ok, just implemented a first draft of the algorithm. Along the rest of the week I will test, improve and tune the parameters of this implementation.

antonior92 commented 7 years ago

Just wrote a blog post about the method here. I focussed mostly on describing the method. I left to present test results in the next blog post.

I think maybe it is a good time to send an email to scipy mailing list reporting my current progress, and with links to the blog and to this repository.

I also think it is a good time to think a little about the project schedule. I have roughly six weeks left. I still need approximately one week to finish the main solver and test it on a larger problem set. This left us with 5 weeks to do three main things:

Implement quasi-Newton Hessians approximation;
Include finite diference derivative approximations;
Create a nice interface to scipy and write documentation.

Implementing quasi-Newton Hessians approximation

I think this should take approximately a week. I would like to implement BFGS, SR1 and L-BFGS. Both BFGS and SR1 are easy to implement. L-BFGS is a little more trick, but I think is one of the most interesting options.

Including finite diference derivative approximation

When I wrote my GSoC proposal I included two weeks to write a graph-colouring scheme to compute sparse Hessians approximations.

I actually don't need this: The method only need the product of the Hessian with a vector. So there is no need to use a graph colouring scheme to find an optimal set of vectors for an given sparse structure.

This make everything a lot easier. I only need to implement simple finite-differences schemes. Furthermore I could use the sparse jacobian approximation Nikolay wrote during his GSoC.

If I have some extra time I could think of including some other derivatives options, as including options for algoPy or other libraries (if they are available).

Write interface and documentation

I think as soon as the solver is ready I could starting implementing an scipy interface and create a pull request. The other two tasks (finite-differences and quasi-newton approximations) could be done concomitantly.

mdhaber commented 7 years ago

"If I have some extra time I could think of including some other derivatives options, as including options for algoPy or other libraries (if they are available)." Yes, please!! There is a lot of talk about how only approximate derivatives are needed for optimization and so automatic differentiation is not really useful. But in my experience, there can be a huge difference in performance - both speed and quality of the solution - depending on whether analytical, automatic, or finite-difference approximations are used. Perhaps in early iterations derivatives don't need to be very good, but I think having accurate derivatives really improves convergence as a solution is approached. I did computational optimal control of highly nonlinear, hybrid-dynamic systems (running robots) in grad school and found that finite difference derivative approximations (at least the methods implemented in the software I was using) were basically unusable, automatic differentiation was quite good, and analytic derivatives were better still (because they were a little faster to evaluate). Some of this is language- and implementation-dependent, but I still think it is well worth the effort to integrate AD into your solver. Personally, I would prioritize it over some of the other options here.

On an unrelated note, for the LP solver, I found input validation and standardization to be quite time consuming to write and test. It sounds like I went a bit overboard, and maybe didn't use the most efficient approach. But I think you have a great solver here and that makes finishing touches - like allowing the user some flexibility with input and nice warning/error messages explaining how to fix whatever is going wrong - even more important, because you want to keep the quality of the user experience as high as the quality of the solver. I mention this because you might want to explicitly think about this sort of thing separate from unit testing and benchmarking of the algorithm itself, as it can take some real time, too.

Nice blog post by the way. Very clear explanation of the barrier function approach.

antonior92 commented 7 years ago

Hey Matt, totally agree with you about the derivatives: automatic differentiation can be significantly better than finite differences. I will invest some time in it.

About input validation and standardisation: thanks for the heads up. I think I always tend to underestimate the time this kind of thing takes.

rgommers commented 7 years ago

If I have some extra time I could think of including some other derivatives options, as including options for algoPy or other libraries (if they are available).

Because there's no clear best/default AD library, maybe it's worth thinking about whether you can do something like use diff_solver=None in your API, and let users pass in what they'd like. If you can find something generic enough, that allows using algopy / autograd / ad / pyadolc / ...

antonior92 commented 7 years ago

Thanks for the suggestion Ralf! It seems sensible to do something as general as possible and I will try to provide these options to the user.

nmayorov commented 7 years ago

Antonio,

Can you outline all the options for the Hessian you are planning to include? There are quite a lot of possible combinations, for example MATLAB's approach is listed here.

Actually for gradients and Jacobians as well.

As for the auto diff packages: to be honest I don't see why they should be a part of the solver and how it could be possibly implemented (i.e. interaction of the solver and autodiff package). In my opinion, if a user wants to use autodiff --- he is free to do so and provide gradients (Jacobians, Hessians) as callable functions. This looks like the most generic and clean approach. Did I miss something?

antonior92 commented 7 years ago

We basically need to compute 3 derivative related function. Each one have some options I would ideally like to make available:

gradient of objective function;
- Finite differences;
- Automatic Differentiation;
Jacobian of the constraints;
- Finite differences;
- Automatic Differentiation;
Lagrangian Hessian;
- Finite differences;
- Automatic Differentiation;
- SR1;
- BFGS;
- L-BFGS;

I don't have a clear plan yet about the derivatives yet and I am bringing it to discussion to help me figure out how to proceed. I think it is highly unlikely that I may be able to implement all of them.

So I think we should prioritise. The Lagrangian Hessian is the absolute priority because it is pretty hard for the user to provide it for a complex problem. I would implement the following options for the lagrangian Hessian:

Finite differences;
SR1;
BFGS;
L-BFGS; Note that I am leaving the automatic differentiation out (Even after all the positive things I said about it). That is because I think it is very convenient to compute the Hessian product with a vector (which is what is actually require) using finite differences. In this case the cost of computing it is not very high and the second order information usually does not need to be that accurate.

That said, I think the next priority should be the Jacobian. And I intend to use something of what you have already implemented in your google summer of code. Ideally I would like to have:

Finite differences;
Automatic Differentiation; options.

The least important in my opinion is the gradient (and also the easier to implement), which the user should not have too much difficulty to provide.

As for the auto diff packages: to be honest I don't see why they should be a part of the solver and how it could be possibly implemented (i.e. interaction of the solver and autodiff package). In my opinion, if a user wants to use autodiff --- he is free to do so and provide gradients (Jacobians, Hessians) as callable functions. This looks like the most generic and clean approach. Did I miss something?

I disagree with you there. You can actually apply the same argument to finite-diference approximations. The point is to provide easy derivative options in order to makes everything much more convenient for the user.

nmayorov commented 7 years ago

I disagree with you there. You can actually apply the same argument to finite-diference approximations. The point is to provide easy derivative options in order to makes everything much more convenient for the user.

In my opinion providing an option without any concrete implementation is not useful. I just don't understand what does "providing automatic differentiation" means in our context. If someone will implement it as a part of scipy --- it would be amazing, otherwise I don't see much point mentioning it all.

As for finite difference --- I think it is another (slightly less convenient in my opinion) alternative to provide function wrappers to give derivatives (numdifftools approach), but the approach to handle this in solver options looks more traditional for scipy (and other optimization packages).

I will read through derivative options later.

antonior92 commented 7 years ago

An automatic differentiation implementation would require too much work for the time I have. The idea was something in the line of providing an option to use an external library to provide automatic differentiation, like algopy.

Your point of view make sense: this should not be a priority, since the user can do it by himself. Nevertheless it would be a nice option to have and if I have time I should do it.

What do you guys think about the following (tentative) proposal:

Finish the solver;
Create a pull request;
Implement quasi-Newton options (SR1, BFGS, L-BFGS) - I think this should be a priority because it is something the user cannot do by himself;
Implement finite-difference options to compute the Lagrangian Hessian;
Implement finite-difference options to Jacobian. Including sparse structure (Which you have implemented);
Include options to automatic differentiation for the gradient and for the Jacobian;

antonior92 commented 7 years ago

By the way, I would like to remember that, as I mentioned in my proposal, I will not be able to work on the project along this week because I am attending to IFAC congress.

Nevertheless, I have internet access and we can use the time to discuss some fine points of the project: what to do with the derivatives, how to interface it to scipy, and this kind of things.

mdhaber commented 7 years ago

Hi Antonio, The plan looks good. I would suggest that you implement only one quasi Newton option before moving on to the rest, and come back to the other quasi Newton options before doing AD if you wish. The rationale is that I think people are more likely to start testing when they don't have to implement their own derivatives, and this way that will be ready sooner. I'd also suggest that when you create your PR (which I assume will include adding documentation wherever you haven't already)

mdhaber commented 7 years ago

Oops. I'm on a phone and accidentally clicked the close button. I'll clean all this up when I'm back at a computer.

I'd also suggest that when you create your PR (which I assume will include adding documentation wherever you haven't already) that you make it so that all fields except for the objective are optional, if that's not yet the case. If others are like me, they'll want to test with a minimal example (e.g. unconstrained Rosenbrock) and move up from there.

antonior92 commented 7 years ago

Hey Matt,

About the quasi-Newton methods: I don't see much advantage in doing one first and the others for latter. Mostly because once I have implemented and tested one of the methods the others will be much easier to implement and test. Furthermore I will be able to use one method to help validate the others.

I believe that if I do all of them in sequence it would not take me more than a one or two extra days to do it, because everything would be fresh for me. If I left it for latter it will probably take more time and I would not be able to compare quasi-Newton methods once they have been implemented.

About the optional parameters: At the last commit I have made almost all the constraints optional. It still does not work for unconstrained problems, but I will tend to it once I go back to this.

nmayorov commented 7 years ago

Hi, Antonio!

Let's try to clear out options for derivatives. If I'm not mistaken we agreed that the solver should have two variants: dense and sparse, and the difference is coming from projections.py. The basic logic would be --- if any matrix structure is sparse then we use a sparse version, otherwise dense. Am I correct here?

Gradient of the objective function:
1. Provided by a user.
2. Estimated by finite differences.
Jacobian of constraints:
1. Provided by a user.
2. Estimated by finite difference, possibly given a sparsity structure.
Hessian of the Lagrangian:
1. Provided by a user as a matrix or LinearOperator --- tricky to do, but some may want to do that anyway.
2. Quasi-Newton approximations. Do different approximations lead to sparse or dense problems?
3. Estimated by finite difference. Here we have considerations:
  1. Estimating a full sparse or dense matrix --- are we going to support this? Estimating the full matrix can be too expensive and efficient estimation with sparsity structure looks like a challenging problem.
  2. Estimating products of the Hessian with a vector. Does it automatically lead to the sparse version of the algorithm? It looks like that. Say a user didn't provide analytic first-order derivatives, do we allow to use this options in this case? That is we use finite-difference estimation of the first-order derivatives and then use them again. In my opinion it looks like a troublesome road and maybe we should disable that (i.e. quasi-Newton approximation is used in this case).

There is another considerations. As there is no scipy.diff package coming from Ashwin we generally want to expose numerical derivative methods (albeit limited in capabilities) to scipy users and thus implementing a Hessian approximation (with sparse structure) might be quite useful. Are you ready to figure it out and write this code?

In the end we would like to have (I wrote that before): num_jac (grad goes here as well), num_hess, check_jac and check_hess.

About auto-differentiation packages. I'm having trouble to understand how it can be made general and robust enough to include such option to the solver. I guess we may try to target one selected implementation (autograd looks like the most solid and popular and seem to work with numpy really well). But even then --- what if they change something and people's code breaks? Who will be responsible for that? I really don't think it is wise to include such dependencies. A scenario when a user implements differentiation in whatever way is convenient to him and passes the prepared functions to our solver looks totally adequate and satisfactory to me.

What is your opinion?

Your plan overall looks good, I think it's time to send a pull request. It is much more convenient to review it there.

rgommers commented 7 years ago

About auto-differentiation packages. I'm having trouble to understand how it can be made general and robust enough to include such option to the solver. I guess we may try to target one selected implementation

I think the idea is not to do that, because that indeed is problematic when something changes in that target library. Plus it would introduce an optional dependency.

A scenario when a user implements differentiation in whatever way is convenient to him and passes the prepared functions to our solver looks totally adequate and satisfactory to me.

So maybe you're saying the same thing, just in different words. I was thinking about something like the "custom minimizers" for minimize. And then in the tutorial a couple of examples can be added of how you pass an autograd or algopy solver in.

antonior92 commented 7 years ago

Thanks for the comments Nikolay and Ralf!

A few notes bellow.

Hessian of the Lagrangian: iii. Estimated by finite difference. a. Estimating a full sparse or dense matrix b. Estimating products of the Hessian with a vector.

The idea is to only do (b). The fact that we don't need to actually compute the full Hessian is one of the major advantages of the algorithm and makes the finite difference approximation very cheap. There is no advantage in actually estimating the full matrix.

Estimating products of the Hessian with a vector. Does it automatically lead to the sparse version of the algorithm?

Yes

Say a user didn't provide analytic first-order derivatives, do we allow to use this options in this case? That is we use finite-difference estimation of the first-order derivatives and then use them again. In my opinion it looks like a troublesome road and maybe we should disable that (i.e. quasi-Newton approximation is used in this case).

I tend to agree with you in there. When we have the options implemented we could do some tests to help us evaluate that.

There is another considerations. As there is no scipy.diff package coming from Ashwin we generally want to expose numerical derivative methods (albeit limited in capabilities) to scipy users and thus implementing a Hessian approximation (with sparse structure) might be quite useful. Are you ready to figure it out and write this code?

As I said before, when I wrote my GSoC proposal I included two weeks to write a graph-colouring scheme to compute sparse Hessians approximations, but actually it would not be used by the solver at all. The solver only needs the product of the Hessian with a vector (So there is no need to use a graph colouring scheme to find an optimal set of vectors for an given sparse structure).

Therefore, my plan is to leave the implementation of a Hessian approximation (with sparse structure), as low priority, since it will not be useful as an option to the solver.

About auto-differentiation packages. I'm having trouble to understand how it can be made general and robust enough to include such option to the solver. I guess we may try to target one selected implementation (autograd looks like the most solid and popular and seem to work with numpy really well). But even then --- what if they change something and people's code breaks? Who will be responsible for that? I really don't think it is wise to include such dependencies. A scenario when a user implements differentiation in whatever way is convenient to him and passes the prepared functions to our solver looks totally adequate and satisfactory to me.

I agree with you there. It may be troublesome to include optional dependencies that may break after a few months/years.

Ralf suggestion is good. We could not include autodifferentiation options explicitly, but write an extensive documentation explaining how autograd and other packages could be used to the compute derivatives and how to pass them to the solver.

Maybe this is the best solution: it alleviate the burden of computing the derivatives (we could provide some examples that could be easily paste and copied by the user) and we don't need to include optional dependencies.

Quasi-Newton approximations. Do different approximations lead to sparse or dense problems?

BFGS and SR1 results in dense Hessians. L-BFGS is more appropriate for sparse structure since it only computes the Hessian vector product.

Ok so the proposal is to implement:

Gradient of the objective function:
1. Provided by a user. **
2. Estimated by finite differences.
Jacobian of constraints:
1. Provided by a user as a dense matrix or sparse matrix. **
2. Estimated by finite difference, possibly given a sparsity structure.
Hessian of the Lagrangian:
1. Provided by a user as a dense matrix or sparse matrix or LinearOperator.
2. Quasi-Newton approximations.
3. Estimating products of the Hessian with a vector (finite-differences).

** Include examples using automatic differentiation in the documentation

What do you guys think?

rgommers commented 7 years ago

All sounds reasonable to me (disclaimer: I'm not really up to speed on the details here)

mdhaber commented 7 years ago

And to me.

antonior92 commented 7 years ago

Hey guys, I just had a slow start this week. Sorry about that. I will make up for it during this weekend so I don't get behind schedule.

Today I implemented some unittests taken from Matlab documentation and from textbooks. The solver seems to be working well enough on those simple problems. The next step is to test on large and hard to solve problems.

I intend to move on to test the solver on problems from CUTEst. In order to do that, I will create a parser that converts constraints in the format (that is used by CUTEst):

h(x) = 0
c_lb  <= c(x) <= c_ub
A_eq x = b_eq
b_lb <= A_ineq x <= b_ub
lb <= x <= ub

to the Matlab-like format I am using. This will also be useful in future steps, since this parser can be used in the API for constraints we have been discussing in #5 .

antonior92 commented 7 years ago

A little update on the current status of the project: Currently I am testing the solver on problems from CUTEst. I am still getting a few failures (I believe is some implementation error: maybe the derivatives calculation or some internal logic). The plan is to fix this up and finish the ipsolver implementation till the end of this week. I intend to present a pull request as soon as the solver is ready.

The initial plan was to have this finished last week. This, unfortunately, didn't work out and I am a little behind schedule.

rgommers commented 7 years ago

Thanks for the update Antonio. I'm not too worried that you're a little behind - schedules are never perfectly followed. Try to catch up of course, but if that doesn't work out it's not a major issue.

I like the quality of the discussion on the minimize_constrained issues by the way.

antonior92 commented 7 years ago

This week I ended up testing and improving the solver. Mostly, I corrected lots of minor implementation details that could make the solver fails under some circumstances. Now the solver manage to solve 67 out of 71 problems I tested it on (some of them need different initial parameters), in three of the cases it is failing it does get close to a solution and the other case is one of the known failures reported on the original paper. I will write a blog post detailing these results latter.

There are still lots of improvements for me to work on but I think it is past time I start integrating it to SciPy. So my plan for the next week is to work on the interface and on the documentation, so we can fit what we have implemented to SciPy.

I have a new entry to the wiki, where I detail what steps I intend to take. I included the finite differences implementation as one step of the integration to SciPy because Nikolay have already implemented almost everything I wanted during his GSoC, so I think it should not require much work from me.

I have basically four weeks left and the plan is basically this:

Week 1:
- Do interface improvements and integrate the solver to scipy.
- Implement finite diference options.
- Present a pull request by the end of the week.
Week 2 and 3:
- Implement quasi-Newton methods.
- General improvements on the interface.
- Implement suggestions I get on the pull request.
Week 4:
- Work on final points of the implementation.
- Improve documentation.
- Testing and parameter tuning.

mdhaber commented 7 years ago

Hi Antonio, The success rate sounds very good. I think having default parameters work for most, requiring parameter tweaks for a few problems, and coming close to the solution on others is acceptable; that's all I can claim for my linear solver. I agree with moving forward. The improvements can be fun things to work on after GSoC - yes, we hope you'll continue to contribute : ) But in the interest of getting the most benefit to users the soonest, the interface and documentation are a higher priority. Your plan forward sounds good! I'm really looking forward to the finite differences; it helps to cut out the user's (my) derivative implementations as a source of error in problem definition : )

antonior92 commented 7 years ago

Yesterday I had a talk with Nikolay and we came to the conclusion that the amount of work of integrating the solver to minimize can be quite overwhelming.

The major dificulties is to come up with something that is consistent for all the solvers. For instance:

There already exist a legacy finite-difference function implemented that we would have to replace (to use the more complete implementation from Nikolay GSoC that accepts sparse structures).
There exists two legacy "constraint-parsers" one for COBLYA and other for SLSQP that we would have to replace by our more complete version.
And so on...

Ideally we would like to replace this old structures and unify all the solvers under the same unifyied implementations for finite-difference, constraint-parsing, and so on...

However, modify these legacy structures is time-consuming and require a lot of modifications on stable and well tested code. So the idea is to follow the sequence:

Create an interface to our solver that could be independently integrated to scipy.
Create an interface to our solver that can be integrated to minimize without thinking too much about the other solvers. That is, it will probably have a lot of options that are available only to our solver (even for features that could be reused for the other solvers)
After that we could start to try to unify the options. That is, little by little we would make the implemented features available for other solvers and try to replace the legacy structures by the ones we have implemented.

For instance the constraint classes I implemented yesterday (that is a more complete alternative to the dictionaries being used): I think its is better to make them available first only to our solver and integrate it to COBLYA and SLSQP as a second step. Same thing for finite difference options and quasi-Newton options. Otherwise I think it can be quite overwelming to try to change all these legacy structures at once.

nmayorov commented 7 years ago

Your summary sounds excellent. Basically you said it much better than how we discussed it. I think your point 1, 2, 3 is the way to go.

mdhaber commented 7 years ago

OK, do what is needed to get it into a state people can test. I suppose the desired interface can come later.

rgommers commented 7 years ago

Definitely first (2) and then (3) makes sense. From the examples given I'm not clear on whether (2) is really much more work than (1). For (1), are you thinking about a public or semi-private (to be used for testing only, not part of the permanent API?

Sorry for the slow reply by the way - I'm in the Alps with limited internet.

antonior92 commented 7 years ago

I am almost finishing stage (1): I have implemented a minimize_constrained function that makes the interface for our two solvers (equality_constrained_sqp and tr_interior_point). I have made it a public interface. You can take a look here to see the current status.

Creating a separate interface was the fastest route o achieve a state "people can test". I am working on some final points and I intend to open a Pull Request with this interface in two or three days. Even if this public interface is not the final one, I think it will be easier to discuss the interface with something more concrete in mind than in abstract terms.

Furthermore, another major advantage of implementing (1) before (2) is that we will have another opportunity to rethink the integration of the solver to the minimize interface with a concrete idea of what parameters and options will have to be included.

nmayorov commented 7 years ago

@rgommers sorry that we sort of ignored your view, but I think that the chosen approach is preferable (and I guess not really much different compared to how you see it).

Having an independent (not connected to minimize) implementation in scipy is already an excellent achievement and fully operational thing. Considering that there is not so much time left I think it's better to have that done until the end of the GSoC, rather than having complicated "work in progress" going. But I think it's quite likely Antonio will be able to finish all 3 points in time (or in 1-2 weeks after the GSoC) following this sequence.

Also +1 to what Antonio said in the last comment: moving towards a concrete independent implementation will help greatly in determining how it should be integrated into minimize.

rgommers commented 7 years ago

Even if this public interface is not the final one, I think it will be easier to discuss the interface with something more concrete in mind than in abstract terms.

Furthermore, another major advantage of implementing (1) before (2) is that we will have another opportunity to rethink the integration of the solver to the minimize interface with a concrete idea of what parameters and options will have to be included.

good points, makes sense.

@rgommers sorry that we sort of ignored your view,

don't think you did, no worries. and if you did, it's my fault for not having internet for a week:)

antonior92 commented 7 years ago

A little update: I have schedule to open the pull request by the end of last week, but I end up finding lots of small details that needed some working:

I included an options for choosing if the Jacobian used internally is dense or sparse (As we have initially agreed upon) and some mechanism to choose which representation to use automatically when needed.
I included an option to guarantee that some specified constraint remains feasible along the iteration.
I included a nice callback and a complete structure for return the results.
I also tested the parsing between constraints types in different situations. This is important because the constraint parser is very error-prone part of the code since I end up relying on the functions np.vstack and sparse.vstack, which can have some odd behaviour in some special situations.
I fixed the solver for the problems READING1 and OPTMASS.

I thinking I am finally approaching a stable and user friendly interface for the solvers. A few details I want to work before the pull request:

Include a “verbose" option for the solver;
When a singular Jacobian appears the solver just fails (This is described in the original paper). I want to see if I use the svd decomposition in this case to avoid this failures.
The solver takes different number of iterations on my two different computer when solving the problem DIXCHLNV and I want to find out why.
I need a good start procedure to initialise the slack variables. I think this is really degrading the performance of the solver.
The docstring of the solver and constraint classes need some improvements.
I am working on some blog post displaying the results in a organised way (here).

The plan is to fix this points along the next two and three days and open the pull request. I definitely want to have this finished before next week, so I can have the final two GSoC weeks for working on derivatives options.

mdhaber commented 7 years ago

Re: singular Jacobian - so you DO experience this some time : ) And it sounds like you want to deal with it by reverting to a more robust solver. Hmm... Sounds familiar... Re: iterations on DIXCHLNV - I suggest exploring that after the PR is up. I've noticed that 'scipy.linalg.solve' appears to work differently on my Mac and PC. I wouldn't be surprised if the solver you are using does. It doesn't sound like a problem to me. Re: blog post - nice results. Only if it's not too much of your time (just your computer's time) - can you include performance of SciPy's existing SLSQP and another solver, maybe IPOPT, for reference?

antonior92 commented 7 years ago

Re: singular Jacobian - so you DO experience this some time : ) And it sounds like you want to deal with it by reverting to a more robust solver. Hmm... Sounds familiar...

Haha, in my defence, when you asked me about this I had not faced this problem yet : ) But, as you can see, our talk did manage to influence my solution.

Re: iterations on DIXCHLNV - I suggest exploring that after the PR is up. I've noticed that 'scipy.linalg.solve' appears to work differently on my Mac and PC. I wouldn't be surprised if the solver you are using does. It doesn't sound like a problem to me.

I will take some time to see what is happening.

Re: blog post - nice results. Only if it's not too much of your time (just your computer's time) - can you include performance of SciPy's existing SLSQP and another solver, maybe IPOPT, for reference?

I will try to include some comparison results, and the code I used for generate the table, and some analysis of the results. There still much work to do on it.

rgommers commented 7 years ago

The plan is to fix this points along the next two and three days and open the pull request.

If there's still more than a day to go: it may be good to open the PR already and just add a checklist at the top of the TODOs. That way code review can already be done on the 98% that is ready.

nmayorov commented 7 years ago

That's a good idea, and I suggest to follow it )

---- On Sat, 12 Aug 2017 08:57:38 +0500 Ralf Gommers <notifications@github.com> wrote ----

The plan is to fix this points along the next two and three days and open the pull request.

If there's still more than a day to go: it may be good to open the PR already and just add a checklist at the top of the TODOs. That way code review can already be done on the 98% that is ready.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

antonior92 commented 7 years ago

Actually, I did manage to implement or fix most of the points I have mentioned above.

I will just fix the docstrings and them open the pull request.

antonior92 commented 7 years ago

Just opened the pull request :)

https://github.com/scipy/scipy/pull/7729

antonior92 commented 7 years ago

My plan is to start working on derivatives options tomorrow and to work on the integration to scipy along the way.

antonior92 commented 7 years ago

I have been thinking and it would be better if we not allow the use of finite-diferences to compute simultaneously first and second derivatives. So the idea is, if someone compute the gradient and the Jacobian using finite differences he needs to use a quasi-Newton option to compute the Hessian. This seems like the safest option for me.

This design choice is the main reason I want to implement the quasi-Newton options before the finite-difference approximations to first-order derivatives. So I am thinking on the following schedule:

Start by implementing options for second derivatives, which includes
- [ ] Finite-difference options;
- [ ] Quasi-Newton options
  - [ ] Set an general api for all quasi-Newton methods
  - [ ] Implement SR1
  - [ ] Implement BFGS
  - [ ] Implement L-BFGS
  - [ ] Test different options on problems from CUTEst
In sequence, implement options for first derivatives (which will basically require me to integrate what Nikolay have already implemented):
- [ ] Finite diferences to compute Jacobian
- [ ] Finite differences to compute gradient

What do you guys think about it?

mdhaber commented 7 years ago

Sounds good. I suggest not waiting for consensus on the order to begin working on them, as you intend to do them all.

antonior92 commented 7 years ago

A little update: During the last week I have worked on the finite differences approximation integration to the solver (which took me a little more time than I have originally expected). I am finishing refactoring the constraint class and should have the finite difference approximations ready by tomorrow. This leave me the last week of the GSoC (22-29 of August) to work on quasi-Newton approximations.

About the submission of my GSoC work product: I am thinking of creating a blog posting pointing to this repository, to my pull request and to the other blog posts. Does anyone have any suggestions about this?

mdhaber commented 7 years ago

Your other blog posts have been good. Were you looking for suggestions on something in particular? Once you post, I can give feedback and check that it fulfills the requirements here.

antonior92 commented 7 years ago

Thanks Matt :)

I can't think on anything specific. I will write the blog post and I will present it here. I have checked the requirements and I think it falls into the first suggestion:

Create a blog post or web page or public GitHub gist that describes the work you've done and links to the commits you've made and repositories you've worked on. If there is still work to be done on the project, include that too. You can also share highlights or challenging pieces.

antonior92 / ip-nonlinear-solver

Course of action #4

Implementing quasi-Newton Hessians approximation

Including finite diference derivative approximation

Write interface and documentation