Closed antonior92 closed 6 years ago
I imagine you want some simpler tests, too. You can get a few of them that are used as examples for other NLP solvers or in textbooks. For example: https://www.mathworks.com/help/optim/ug/fmincon.html These simple examples with only some types of constraints could be useful for debugging if a bug is introduced. But it would also be nice to have some large problems, so maybe implement a few CUTEst problems in Python. The purpose of these would be to show how your solver handles the most general, large-scale problems.
BTW, this is awesome. I am really, really impressed by the performance. You know, your solver wasn't supposed to beat KNITRO in terms of speed, even if KNITRO was doing a lot of unnecessary work reducing the optimality without much progress on the objective. What's next? unit tests and benchmarking?
Hey Matt, thanks for the suggestions! I will try to take some examples from mathwork website and from textbook.
About the performance, I find it unexpected that my solver beats the results from KNITRO in the paper. But it make sense, this paper is from 2004, so I probably have a better hardware. Besides that, I think KNITRO runs more iterations than I do. Latter I will do some comparisons with IPOPT both running with the same configurations, so we can get more realistic results.
But, nevertheless, it shows that our implementation is reasonably efficient. :)
You could also try your solver on some of the scalable/large linear problems in the new linprog_ip unit tests. If it beats that it would be pretty awesome for you, embarrassing for me, and funny/impressive for us all ;-) Matt
On Jun 22, 2017 5:04 PM, "Antonio Horta Ribeiro" notifications@github.com wrote:
Hey Matt, thanks for the suggestions! I will try to take some examples from mathwork website and from textbook.
About the performance, I find it unexpected that my solver beats the results from KNITROpresented in the paper. But it make sense, this paper is from 2004, so I probably have a better hardware. Besides that, I think KNITRO runs more iterations than I do. Latter I will do some comparisons with IPOPT both running with the same configurations, so we can get more realistic results.
But, nevertheless, it shows that our implementation is reasonably efficient. :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-310534698, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK8cMLhZ94Kdsf6glANwR_XfHYu4bks5sGwETgaJpZM4OC4iT .
It is a nice idea. I will try it latter, after I have implemented the last bit of the algorithm. For now the algorithm is not able to deal with inequality constraints yet, only with problems of the type:
minimize f(x)
subject to: c(x) = 0
I made some modifications in the CG tolerance in order to be able to get more accurate solutions. I manage to get the optimality of the solution as low as 2.1e-05
which is much better than the previous 9.6e-03
I was getting before. I still didn't manage to get it as low as 4.0e-07
.
I will try to compare the solution of my solver and KNITRO in other problems to try to find out if this should concern us or not.
I made some modifications in the CG tolerance in order to be able to get more accurate solutions.
eta * ||grad||
. Isn't this discussed in your main source ([1] from wiki)?projected_cg
. But I can't quickly figure how it is chosen, it doesn't look like a scheme for Newton-CG from Num. Opt.: min(0.5, sqrt(||grad||)) * ||grad||
.My understanding is that with this approach the (global) objective function gradient is c
and the tolerance should be chosen based on c
.
Can you elaborate?
tol = 0.01 * np.sqrt(rt_g)
which was suggested in the paper.
But I realised that for low enough rt_g
we have that 0.01 * np.sqrt(rt_g) > rt_g
, which was causing the CG to make no progress at all. I changed the definition to tol = min(0.01 * np.sqrt(rt_g), 0.1 * rt_g)
which guarantees that always tol < rt_g
. This provided some improvement on the solution.ELEC
and CHANNEL
. I presented to you the results on ELEC
. On CHANNEL
the start point is an optimal point that violates the constraints. The algorithm converges very fast to a very optimal solution (8 iterations) and I found nothing interesting about the problem or the results (everything seems to be working well on it).I was having trouble finding optimisation problems only with equality constraints. But just today I found a lot of test Nocedal himself used to test the Byrd-Omojukun implementation. They are described in [12] (commented references). I will do some test on it.
Are you referring to COPS? Maybe it is a good option
I want to show some results and share some of my concerns before discussing the plan for this week.
After some modification I am getting:
total time = 7.1 sec
f = 1.843891e+04
optimality = 3.4e-07
c violation = 1.3e-15
niter = 113
which seems pretty reasonable to me. Note that the optimality now is better than the one originally attained by KNITRO on the given table.
I am still tuning some of the internal parameters and I intend to do tests on a more diverse set of problems. Reference [12] just give me some good ideas.
What is still troubling me is the way it behaves when it gets very near the solution point:
After some point the algorithm doesn't progress any further because the trust-region radius starts to decay really fast. And doesn't matter how I set the stop criteria I can't gen any further reduction.
The idea of the algorithm reaching a point where it doesn't progress anything is really troublesome for me. Because it seems to contradict the superlinear convergence of the algorithm near the solution. I think it is happening just because of large roundoff errors. But any thoughts on this would be much appreciated.
My schedule for this week:
As I have described on my GSoC proposal I will attend to the IFAC world congress from 9-14 of July (Two weeks from now). So my plan is to have a first version of the implementation of the final algorithm before that.
Maybe test another solver on the same problem starting near the solution produced by your solver? Does it produce a better solution? How does it behave?
A test problem is attached. It comes from solving for coefficients and step sizes for numerical differentiation to get the highest order formula possible. You're solving for the ai and bi. Start with n = 3 and m=3, then try m=4 and higher. Is it still feasible for higher m? (You probably need to normalize the equations as it can make the higher order sums very small just with small bi.)
On Jun 27, 2017 7:49 AM, "Antonio Horta Ribeiro" notifications@github.com wrote:
After some modification I am getting:
total time = 6.4 sec f = 1.843892e+04 optimality = 2.2473e-06 c violation = 4.8e-15 niter = 117
which seems pretty reasonable to me.
What is still troubling me is the way it behaves when it gets very near the solution point [image: optimality_elec_soc] https://user-images.githubusercontent.com/16557411/27592810-c81d08e8-5b2b-11e7-8269-96c5e7824b62.png As you can see a very big jump happens near iteration 90. And after the algorithm doesn't progress any further because the trust-region radius starts to decay really fast.
The idea of the algorithm reaching a point where it doesn't progress anything is really troublesome for me. Because it seems to contradict the superlinear convergence of the algorithm near the solution. I think it is happening just because of large roundoff errors. But any thoughts on this would be much appreciated.
My schedule for this week:
- Today and Tomorrow: Implement unit test. Do more tests from CUTEst (The same ones Nocedal used on [12]). And do some fine tuning of the parameters
- Thursday and Friday: Write implementation plan for the last part of the implementation (sequence of barrier problems). Implement a first draft of it.
- Next week: Do test on the implementation and tune the parameters
As I have described on my GSoC proposal I will attend to the IFAC world congress from 9-14 of July (Two weeks from now). So my plan was to having a first version of the implementation of the final algorithm before that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311382353, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK8vStFrH_NAyNZFWR8_b_-Gmjq43ks5sIRZpgaJpZM4OC4iT .
Here's another. You can make up the ci (random, preferably with some negative). It's a binary integer linear programming problem, essentially. I think it would trip up slsqp (if I remember correctly). You can also add constraints like
On Jun 27, 2017 8:32 AM, "Matt Haberland" mdhaber@mit.edu wrote:
Maybe test another solver on the same problem starting near the solution produced by your solver? Does it produce a better solution? How does it behave?
A test problem is attached. It comes from solving for coefficients and step sizes for numerical differentiation to get the highest order formula possible. You're solving for the ai and bi. Start with n = 3 and m=3, then try m=4 and higher. Is it still feasible for higher m? (You probably need to normalize the equations as it can make the higher order sums very small just with small bi.)
On Jun 27, 2017 7:49 AM, "Antonio Horta Ribeiro" notifications@github.com wrote:
After some modification I am getting:
total time = 6.4 sec f = 1.843892e+04 optimality = 2.2473e-06 c violation = 4.8e-15 niter = 117
which seems pretty reasonable to me.
What is still troubling me is the way it behaves when it gets very near the solution point [image: optimality_elec_soc] https://user-images.githubusercontent.com/16557411/27592810-c81d08e8-5b2b-11e7-8269-96c5e7824b62.png As you can see a very big jump happens near iteration 90. And after the algorithm doesn't progress any further because the trust-region radius starts to decay really fast.
The idea of the algorithm reaching a point where it doesn't progress anything is really troublesome for me. Because it seems to contradict the superlinear convergence of the algorithm near the solution. I think it is happening just because of large roundoff errors. But any thoughts on this would be much appreciated.
My schedule for this week:
- Today and Tomorrow: Implement unit test. Do more tests from CUTEst (The same ones Nocedal used on [12]). And do some fine tuning of the parameters
- Thursday and Friday: Write implementation plan for the last part of the implementation (sequence of barrier problems). Implement a first draft of it.
- Next week: Do test on the implementation and tune the parameters
As I have described on my GSoC proposal I will attend to the IFAC world congress from 9-14 of July (Two weeks from now). So my plan was to having a first version of the implementation of the final algorithm before that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311382353, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK8vStFrH_NAyNZFWR8_b_-Gmjq43ks5sIRZpgaJpZM4OC4iT .
Hey Matt, can you try to send the problems again (probably in another way). It does not appear for me.
Oops, I was sending attachments from my phone. Amateur.
The numerical differentiation problem referred to above:
The binary integer LP from above:
If you want to go pro, add the constraint Ax = b from:
import numpy as np
def magic_square(n):
np.random.seed(0)
M = n*(n**2+1)/2
numbers = np.arange(n**4) // n**2 + 1
numbers = numbers.reshape(n**2,n,n)
zeros = np.zeros((n**2,n,n))
A_list = []
b_list = []
# Rule 1: use every number exactly once
for i in range(n**2):
A_row = zeros.copy()
A_row[i,:,:] = 1
A_list.append(A_row.flatten())
b_list.append(1)
# Rule 2: Only one number per square
for i in range(n):
for j in range(n):
A_row = zeros.copy()
A_row[:,i,j] = 1
A_list.append(A_row.flatten())
b_list.append(1)
# Rule 3: sum of rows is M
for i in range(n):
A_row = zeros.copy()
A_row[:,i,:] = numbers[:,i,:]
A_list.append(A_row.flatten())
b_list.append(M)
# Rule 4: sum of columns is M
for i in range(n):
A_row = zeros.copy()
A_row[:,:,i] = numbers[:,:,i]
A_list.append(A_row.flatten())
b_list.append(M)
# Rule 5: sum of diagonals is M
A_row = zeros.copy()
A_row[:,range(n),range(n)] = numbers[:,range(n),range(n)]
A_list.append(A_row.flatten())
b_list.append(M)
A_row = zeros.copy()
A_row[:,range(n),range(-1,-n-1,-1)] = numbers[:,range(n),range(-1,-n-1,-1)]
A_list.append(A_row.flatten())
b_list.append(M)
A = np.array(np.vstack(A_list),dtype = float)
b = np.array(b_list,dtype= float)
#c = np.zeros(A.shape[1],dtype= float)
c = np.random.rand(A.shape[1])
return A,b,c, numbers
n = 3
A, b, c, numbers = getAbc(n)
#[Your solver here. Be sure to add all the binary constraints xi (xi-1) = 0]
#sol = bintprog(c,A_eq=A, b_eq = b, options={"disp":True})
x = sol['x'].astype('int') # you might need to modify this; the point is x is binary
s = (numbers.flatten()*x).reshape(n**2,n,n)
square = np.sum(s,axis=0)
print(square)
Thank you a lot Matt. I will try it out.
How about plotting function residuals (from the value given in the paper)? Maybe plot constraint violations as well. I mean maybe they look better (unlikely as you explained about really small radius of the trust regions).
Can you share the setup for ELEC
problem to experiment with your code? I want to understand better what is going on when the progress stops.
I have tried to plot the constraint violation and the function along the iterations. They doesn't add much information to the scenario.
The problem is to reach a point where the step length goes to 0
very quickly and it gets impossible to get any further improvement. Maybe it is only because of roundoff errors. But nevertheless, it makes me uneasy.
I think the best approach now is to test the solver on lots of problems and see how it performs.
The code for ELEC
and CHANNEL
:
https://gist.github.com/antonior92/7d3355bcc1c5cc13eff5dbfa86b25a67
I tested the solver on several problems from CUTEst. Including the ones I have already mentioned, the solver was already tested on:
ELEC
.CHANNEL
.ORTHREGA
.ORTHREGC
.ORTHREGD
.HAGER2
.HAGER3
.DTOC1ND
.DTOC2
.DTOC3
.DTOC4
.DTOC5
.DTOC6
.EIGENA2
.EIGENC2
.ARTIF
.BRATU3D
.I tried out these problems for a few different parameters and it converges to the solution in most of the cases. I found two problems where the solver fails to converge ORTHREGD
(only when you set NPTS=2500
) and DTOC1ND
.
https://gist.github.com/antonior92/3c19e4cf3002f56d0c7cc83c81156190
I will present the results in a more organised way latter. For now, I intend to try to understand why the solver is failing in this two problems.
I think my concerns about the convergence on ELEC
have been unfounded, since the solver is converging well enough in most of the problem.
So the plan for today:
First I will try to understand why the solver is failing on ORTHREGD
and on DTOC1ND
and see what I can do about it. Latter I will implement a few unit tests (the problems Matt suggested and maybe problem 2 from COPS)...
Update: Problems that does not converges for default initial setting, may converge for other parameters. For instance:
ORTHREGD
(for NPTS=2500
and NPTS=5000
) fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.ORTHREGD
(for any other NPTS
) always converge.DTOC1ND
fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.You said ORTHREGD npts=2500 was the largest possible setting, but you tried it with npts=5000?
This is really good, Antonio. And I was going to ask why you thought that the very steep drop in the plot you sent yesterday was not the appropriate convergence near the solution? True, it flatlines after that, but as you say it may be difficult to achieve any greater precision due to (normal) roundoff errors for which there there may not much you can do. For instance, I constructed an innocuous-looking example for the numerical differentiation suite yesterday that thwarts attempts to achieve good relative error and also prevents good absolute error (regardless of settings used) by forcing high roundoff error and truncation error simultaneously. It may just be a nasty problem for this particular algorithm.
It sounds like you don't need any other example problems, so you can ignore the ones I sent yesterday for now if you want to move on. (I was skimming your email and only saw the one that said you were having trouble finding equality constrained problem, but not the line that said you found plenty.) They might be good to include in the unit tests, though, because I'm pretty sure I can give you simple variants of those for which slsqp fails.
On Jun 28, 2017 6:33 AM, "Antonio Horta Ribeiro" notifications@github.com wrote:
Update: Problems that does not converges for default initial setting, may converge for other parameters. For instance:
- ORTHREGD (for NPTS=2500) fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.
- ORTHREGD (for NPTS=5000) fails to converges for all set of parameters I have tried.
- ORTHREGD (for any other NPTS) always converge.
- DTOC1ND fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311661239, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCKwOpfwDZbTzCezfyJLNj6sqvJX3-ks5sIlY8gaJpZM4OC4iT .
Hey Matt, the largest possible setting is actually npts=5000. I edited the comment right after posting it, but you probably only got the first version via email.
Furthermore, I am interested on the problems you send to me. I will try to implement them.
Ugh, one step behind again : ) I'll try to get up to speed! Along those lines, would you be available for a voice chat sometime this week? I don't mean to hinder your progress to help me catch up, but if you need a break from the coding, I'd appreciate it. Nikolay, you are of course invited, too, but I understand if you can't make it as you are up to date!
On Jun 28, 2017 8:10 AM, "Antonio Horta Ribeiro" notifications@github.com wrote:
Hey Matt, the largest possible setting is actually npts=5000. I edited the comment right after posting it, but you probably only got the first version via email.
Furthermore, I am interested on the problems you send to me. I will try to implement them.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311690318, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK4FI-2Mc60NpxmucM-Vg1j5AlppPks5sImz5gaJpZM4OC4iT .
Hey Matt, this is actually my fault. I always end up editing my comments afterwards. If you take a look at this issue I have edited 9 of my 13 comments :)
It will be nice to voice chat this week. For me it could be Thursday (tomorrow) or Friday some time between 11:00 and 15:00 UTC or between 17:00 and 21:00 UTC. What do you think Matt? And what about you Nikolay?
How about 17:30 UTC tomorrow (Thursday)? On google hangouts, I'm matt.haberland.
On Wed, Jun 28, 2017 at 12:59 PM, Antonio Horta Ribeiro < notifications@github.com> wrote:
Hey Matt, this is actually my fault. I always end up editing my comments afterwards. If you take a look at this issue I have edited 9 of my 13 comments :)
It will be nice to voice chat this week. For me it could be be Thursday (tomorrow) or Friday some time between 11:00 and 15:00 UTC or between 17:00 and 21:00 UTC. What do you think Matt? And what about you Nikolay?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311770461, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK1gbtLxfHlY8mxWnSMejRADa523yks5sIrCKgaJpZM4OC4iT .
It is good for me.
I think that's what buffer /wrap-up time is for and you have some. I agree with the idea of moving on to the full solver.
On Jun 29, 2017 5:22 AM, "Antonio Horta Ribeiro" notifications@github.com wrote:
Hey Nikolay and Matt,
Yesterday I was testing the solver on problem 15.4 from Nocedal and Wright. I found a very interesting result, when starting from point [0.7, 0.7] and initial penalty 1 the solver get sucked because the penalty is not large enough.
https://gist.github.com/antonior92/914e3d3eb056fd5f3c640ab37ced04cf
It is actually easy to understand why. The step is accepted or rejected based on the merit function variation:
merit_function - merit_function_next = (fun - fun_next) +
- penalty*(norm(constr) - norm(consort_next))
What is happening is that fun - fun_next is increasing and the constraint violation (norm(constr) - norm(consort_next)) is decreasing. But since the penalty is not high enough:
merit_function - merit_function_next < 0
and the step is rejected. This happens for every step, no matter how small. And because of this the problem fails to converge.
I tried out some heuristics for selecting the penalty from [2]. What worked better is a simple safeguard I devised (But I not entirely satisfied with it either). I will commit it on a separate branch and latter we will see how to proceed about this.
Anyway, I have already spent too much time at this SQP solver (I always keep founding details I want to improve). I will do some last modifications on it, commit the unittest I have implemented ( including ELEC implementation Nikolay have send me) and move on.
[1] Nocedal, Jorge, and Stephen J. Wright. "Numerical optimization" Second Edition (2006). http://www.bioinfo.org.cn/%7Ewangchao/maa/Numerical_Optimization.pdf
[2] Lalee, Marucha, Jorge Nocedal, and Todd Plantenga. "On the implementation of an algorithm for large-scale equality constrained optimization." SIAM Journal on Optimization 8.3 (1998): 682-706. https://pdfs.semanticscholar.org/f60c/96056858acc5a582587d19b065fd72175daa.pdf
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-311950223, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK0kSKkJdzrnU8DM2vV67dDYN_BJLks5sI5b2gaJpZM4OC4iT .
Hi Antonio,
I was trying to run the unit tests this morning. It seemed that the top level ipsolver
was not in a working state (which is fine, as it's under development). For instance, the unit test only provided 8 of the 14 required inputs (you must have added the separate bounds and linear constraints recently without default values); there was a typo where nslack
should have been n_slack
, ipsolver
invokes a method update
that BarrierSubproblem
doesn't have, etc... No big deal. Anyway, just let me know when it's in good shape to try again.
I also tried the EQP solver and found that while ProblemELEC
seems to work, ProblemMaratos
doesn't. Should it? I didn't look too deeply, but it looks like it's maxing out on iterations and optimality is at 3.0 (rather than < 1e-5).
Hey Matt, Unittest were working for me on commit "Simple test for ipsolver (together with bug corrections)." (from tomorrow). After that I changed interface, breaking a lot of things. I am working on it right now, sorry about that.
No problem. And I found the issue with EQP. Degrees is specified as an integer, so integer/180 was 0 for me in Python 2. You should include the usual from __future__
imports at the top.
(never mind the bit about v0
from original post)
I was trying to use the EQP solver for minimization of Rosenbrock function (unconstrained), but the EQP solver seems to need constraints. I'm sure you'll get to that at some point; just thought I'd record it here.
Hey Matt there are some rough edges on those functions. I don't check arguments because they are internal functions and no user will have to actually access them.
I have been testing the unittests mostly on python 3, that is why I missed the division thing you just pointed out.
I defined defaults for the function ipsolver
and all the unittests seems to be working for the latest commit.
The solver right now requires at least one constraint. I didn't decide how to proceed about it yet. I can do modifications in order to allow unconstrained problems as well (it shouldn't be very hard). Or I can just send an error, saying the user to try an unconstrained solver.
Thanks! I guess I just need to try it with harder problems :-)
On Jul 6, 2017 4:29 PM, "Antonio Horta Ribeiro" notifications@github.com wrote:
The solver right now requires at least one constraint. I didn't decide how to proceed about it yet. I can do modifications in order to allow unconstrained problems as well (it shouldn't be very hard). Or I can just send an error, saying the user to try an unconstrained solver.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antonior92/ip-nonlinear-solver/issues/6#issuecomment-313546654, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRCK9yY-uHYGG5RDpE_Lu_h1kWo125yks5sLW3JgaJpZM4OC4iT .
Results of the SQP solver on the test problem
ELEC
.This problem is described in [1], problem 2, and consist of given
np
electrons, find the equilibrium state distribution (of minimal Coulomb potential) of the electrons positioned on a conducting sphere.The performance of other comercial solvers (according to [1]):![screenshot 2017-06-22 18 04 34](https://user-images.githubusercontent.com/16557411/27455919-5e9e174a-5775-11e7-8236-5e829c2f0bcd.png)
I choose this problem because the KNITRO package performs well on it, so our solver is expected to perform well on it as well, since the underlying principle is the same.
For
np = 200
the performance of our implementation is:As you can see the execution time is very competitive and the final value of
f
seems very close to the one obtained by the KNITRO package. The constraint violation also seems pretty ok.I don't know how the optimality is defined on the previous table, but I defined it as the norm of the Lagrangian gradient
||grad L(x, lambda)||
. I am not being able to reduce this optimality bellow9.6e-3
, after that the algorithm step gets too small and I cannot get more progress than that. On the table from the paper, KNITRO seems to be able to reduce the optimality down to3.9e-7
, but again, maybe it is only because of different definitions.Besides that, the decay of this optimality measure
||grad L(x, lambda)||
seems pretty consistent along the interactions.My initial desire to design small test of my own have not been very successful. Mostly because I did not manage to think meaningful test by myself. So I have been focusing on performing test on problems from CUTEst collection. Nevertheless, I want to include some unit tests for the method. What do you guys think it would be the best option in this case? Implement methods from CUTEst in Python and use them for the unit tests?
[1] Dolan, Elizabeth D., Jorge J. Moré, and Todd S. Munson. "Benchmarking optimization software with COPS 3.0." Argonne National Laboratory Research Report (2004).