coin-or / Ipopt

COIN-OR Interior Point Optimizer IPOPT
https://coin-or.github.io/Ipopt
Other
1.41k stars 281 forks source link

Ipopt giving nan for objective function #703

Closed bharswami closed 11 months ago

bharswami commented 1 year ago

I am trying to interface SPRAL with IPOPT in Matlab. In debug mode, I found that the objective function and primal constraints became nan. This should be because the vairable "x" itself becomes nan. What could be the reason? The optimization problem: Minimize sin(x)^2 Constaints: -4<=x<=8 Subject to: 1<=x<=10

bharswami commented 1 year ago

To give more perspective. The error occurs in IpPDSearchDirCalc.cpp at SmartPtr rhs = IpData().curr()->MakeNewContainer(); rhs->Set_x(*IpCq().curr_grad_lag_with_damping_x()); The value being set is nan.

This is because mu in IpAdaptiveMuUpdate.cpp is being set as nan at: IpData().Set_mu(mu);

mu becomes nan due to the following sequence of function calls:

I am not sure if this is a bug since z_L and z_U are set to zero at lines 794 to 811 in file IpIpoptAlg.cpp.

SmartPtr<Vector> y_c = IpData().curr()->y_c()->MakeNew();
SmartPtr<Vector> y_d = IpData().curr()->y_d()->MakeNew();
bool retval = eq_multiplier_calculator_->CalculateMultipliers(*y_c, *y_d);
if( retval )
{
   SmartPtr<const IteratesVector> curr = IpData().curr();
   SmartPtr<IteratesVector> iterates = curr->MakeNewContainer();
   iterates->Set_x(*curr->x());
   iterates->Set_s(*curr->s());
   iterates->Set_z_L(*curr->z_L());
   iterates->Set_z_U(*curr->z_U());
   iterates->Set_v_L(*curr->v_L());
   iterates->Set_v_U(*curr->v_U());
   iterates->Set_y_c(*y_c);
   iterates->Set_y_d(*y_d);
   IpData().set_trial(iterates);
   IpData().AcceptTrialPoint();
}
svigerske commented 1 year ago

The code does not expect avrg_compl to be 0. That would mean a point at the boundary, which one shouldn't have for an interior point algorithm. So I don't really see how it happened that "z_L and z_U are zero" here. The code in "lines 794 to 811 in file IpIpoptAlg.cpp" doesn't set them explicitly to 0, but just copies the values from the current iterate.

Paper https://optimization-online.org/2005/03/1089/ explains the idea behind computing mu in QualityFunctionMuOracle::CalculateMu(). However, eventually, mu is always set to a multiple of avrg_compl, so the latter should not be zero.

Since the problem is very small (which may be the cause for odd things to happen), you could run Ipopt with option print_level set to 12 and attach the log here. Maybe that helps to see what is going on.

bharswami commented 1 year ago

Thanks Stefan. I am attaching the log file here. I can't make much of it, but hopefully you should be able to see what's going on. ipopt_log.txt

bharswami commented 1 year ago

I have put checks at several points on the code, to avoid singularities/divide-by-zeros and NaNs/Infs.

bharswami commented 1 year ago

This is the matlab code executing ipopt: opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'matching'; opts.ipopt.spral_ignore_numa = 'yes'; opts.ipopt.jac_d_constant = 'yes'; opts.ipopt.hessian_constant = 'yes'; opts.ipopt.max_iter = 100; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 1; opts.ipopt.linear_solver = 'spral'; opts.ipopt.nlp_upper_bound_inf = 1e6; opts.ipopt.nlp_lower_bound_inf = -1e6; opts.lb = -10; opts.ub = 10; opts.cl = -10; opts.cu = 10; opts.zl = -10; opts.zu = 10; funcs.objective = @(x) (x-1)^2; funcs.gradient = @(x) 2*(x-1); funcs.constraints = @(x) x; funcs.jacobian = @(x) sparse(1); funcs.jacobianstructure = @(x) sparse(1); funcs.hessian = @(x) sparse(0); funcs.hessianstructure = @(x) sparse(1); [x,fval] = ipopt_libs(0,funcs,opts)

svigerske commented 1 year ago

This log doesn't seem to belong to this issue. The place in the code that you mention are executed only if mu_strategy is set to adaptive, but the log doesn't mention this option setting. Further, there are no nan's in the log, it just stops at an iteration limit.

Further, the matlab code you posted now also seems to solve a different NLP, and doesn't change the mu_strategy either.

bharswami commented 1 year ago

I had put checks at several places along the code to check for nans and circumvent those. "The place in the code that you mention are executed only if mu_strategy is set to adaptive, " - are you referring to the avrg_compl issue? It is definitely being hit in this run

svigerske commented 1 year ago

"The place in the code that you mention are executed only if mu_strategy is set to adaptive, " - are you referring to the avrg_compl issue? It is definitely being hit in this run

Yes. But I don't see this in the log. You said the variable x becomes nan. But a grep -F "curr_x[" ipopt_log.txt doesn't show any problematic numbers for the iterate. No nan in the log at all.

bharswami commented 1 year ago

Here's an example of what I am doing. It is out of desperation I admit: SmartPtr zerotrial = IpData().trial()->MakeNewIteratesVectorCopy(); zerotrial->Set(0.0); xtemp = IpData().curr()->x(); trialx = IpData().trial()->x(); Number val; mu = IpData().curr_mu(); val = IpData().trial()->Min(); if ((val - val) != 0 || abs(val)>1e100) IpData().set_trial(zerotrial); //Here's where I am checking trial is NaN or inf. I believe this trial is what is assigned to x. I can //remove all this Nan circumventions and send you the "actual log".

bharswami commented 1 year ago

Here's the log for the problem: opts = []; funcs = []; opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'matching'; opts.ipopt.spral_ignore_numa = 'yes'; opts.ipopt.jac_d_constant = 'yes'; opts.ipopt.hessian_constant = 'yes'; opts.ipopt.max_iter = 100; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 12; opts.ipopt.linear_solver = 'spral'; opts.lb = 0; opts.ub = 1; opts.cl = -10; opts.cu = 10; opts.zl = -10; opts.zu = 10; funcs.objective = @(x) (x-1)^2; funcs.gradient = @(x) 2*(x-1); funcs.constraints = @(x) x; funcs.jacobian = @(x) sparse(1); funcs.jacobianstructure = @(x) sparse(1); funcs.hessian = @(x) sparse(0); funcs.hessianstructure = @(x) sparse(1); [x,fval] = ipopt_libs(0,funcs,opts) from the unaltered repository. You will see the NaNs in this. ipopt_log.txt

bharswami commented 1 year ago

The above log file I sent you is from the unaltered repository. I am also attaching the log from the unaltered repository with limits on the variable x [-10,10]. ipoptlog.txt

bharswami commented 1 year ago

Hi Stefan. It will be great if you can help me with these issues. I am trying to learn as I build the code and get ready a mex with SPRAL.

bharswami commented 1 year ago

Can you atleast confirm if there indeed is a bug in ipopt?

svigerske commented 12 months ago

As I don't see from the log where the nan for the Barrier function comes from, I try to reproduce with the C++ interface. But something is odd in your log, where I don't see where this comes from.

First, you have

DenseVector "original x_L unscaled" with 1 elements:
original x_L unscaled[    1]= 0.0000000000000000e+00
DenseVector "original x_U unscaled" with 1 elements:
original x_U unscaled[    1]= 1.0000000000000000e+00
DenseVector "original d_L unscaled" with 1 elements:
original d_L unscaled[    1]=-1.0000000000000000e+01
DenseVector "original d_U unscaled" with 1 elements:
original d_U unscaled[    1]= 1.0000000000000000e+01
DenseVector "modified x_L scaled" with 1 elements:
modified x_L scaled[    1]= 0.0000000000000000e+00
DenseVector "modified x_U scaled" with 1 elements:
modified x_U scaled[    1]= 1.0000000000000000e+00
DenseVector "modified d_L scaled" with 1 elements:
modified d_L scaled[    1]=-1.0000000000000000e+01
DenseVector "modified d_U scaled" with 1 elements:
modified d_U scaled[    1]= 1.0000000000000000e+01

But Ipopt is actually relaxing variable bounds and sides of inequalities by default. The modified bounds should be

DenseVector "modified x_L scaled" with 1 elements:
modified x_L scaled[    1]=-1.0000000000000000e-08
DenseVector "modified x_U scaled" with 1 elements:
modified x_U scaled[    1]= 1.0000000099999999e+00
DenseVector "modified d_L scaled" with 1 elements:
modified d_L scaled[    1]=-1.0000000099999999e+01
DenseVector "modified d_U scaled" with 1 elements:
modified d_U scaled[    1]= 1.0000000099999999e+01

I don't see how this could be skipped, since you didn't have the corresponding options set to skip this.

More importantly, the problem statistics written to the log seem wrong:

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

The variable has both lower and upper bounds and the constraint also has both lower and upper bounds.

Can you check again that you have no modifications in the Ipopt code?

bharswami commented 12 months ago

I will check and revert. I am using some cpp files from Jonathan Currie’s version of ipopt (for MATLAB). I will download the coin-or ipopt repo again and run.

bharswami commented 12 months ago

Do you know where I can get the latest matlab interface files for ipopt?

svigerske commented 12 months ago

The only maintained Matlab interface to Ipopt I know of is https://github.com/ebertolazzi/mexIPOPT

bharswami commented 12 months ago

But that doesn't have the cpp files? I need those to build

svigerske commented 12 months ago

I think that the source files for his Matlab/Ipopt interface are at https://github.com/ebertolazzi/mexIPOPT/tree/master/toolbox/src. I cannot say how to build this. I don't have Matlab.

bharswami commented 12 months ago

Stefan, Thanks for your help. The issue was that the BLAS library files weren't properly interfaced with the IPOPT code. Now the optimizer runs without any NaNs but doesn't converge. Please look at the following example: opts = []; funcs = []; opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'matching'; opts.ipopt.spral_ignore_numa = 'yes'; opts.ipopt.jac_d_constant = 'no'; opts.ipopt.hessian_constant = 'no'; opts.ipopt.max_iter = 1000; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 12; opts.ipopt.linear_solver = 'spral'; opts.lb = [-10]; opts.ub = [10]; opts.cl = -10; opts.cu = 10; funcs.objective = @(x) (x-2)^2; funcs.gradient = @(x) 2*(x-2); funcs.constraints = @(x) x; funcs.jacobian = @(x) sparse(1); funcs.jacobianstructure = @(x) sparse(1); funcs.hessian = @(x) sparse(0); funcs.hessianstructure = @(x) sparse(1); [x,fval] = ipopt_libs(2,funcs,opts) diary off

The answer is x = 2, but optimizer doesn't converge. Please have a look at the log file. What is preventing the optimizer from converging? ipopt_log_x.txt

Also have a look at an example of 2 variables. diary ipopt_log opts = []; funcs = []; opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'matching'; opts.ipopt.spral_ignore_numa = 'yes'; opts.ipopt.jac_d_constant = 'no'; opts.ipopt.hessian_constant = 'no'; opts.ipopt.max_iter = 1000; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 12; opts.ipopt.linear_solver = 'spral'; opts.lb = [-10;-10]; opts.ub = [10;10]; opts.cl = -10; opts.cu = 10; funcs.objective = @(x) ((x(1)-1)^2+(x(2)-2)^2); funcs.gradient = @(x) 2*[x(1)-1 x(2)-2]; funcs.constraints = @(x) x(1); funcs.jacobian = @(x) sparse([1 0]); funcs.jacobianstructure = @(x) sparse([1 1]); funcs.hessian = @(x) sparse([0 0;0 0]); funcs.hessianstructure = @(x) sparse([1 1;1 1]); [x,fval] = ipopt_libs([3;3],funcs,opts) diary off

The solution is x = [1;2]. Doesn't seem to converge again. Log file attached. ipopt_log_x1_x2.zip Please suggest. Thanks.

svigerske commented 12 months ago

I would think that there might be an issue with Spral. Do you get the same problem when using a different linear solver, e.g., Mumps?

This is the log I get for the first NLP: log_x.txt When I grep for residual_ratio, it seems that all linear systems are solved fine:

residual_ratio = 1.161487e-18
residual_ratio = 1.708076e-19
residual_ratio = 6.936164e-15
residual_ratio = 1.601423e-16
residual_ratio = 1.216627e-15
residual_ratio = 4.818324e-17
residual_ratio = 2.960622e-15
residual_ratio = 4.131101e-17
residual_ratio = 2.701665e-14
residual_ratio = 1.245868e-17
residual_ratio = 1.946965e-13
residual_ratio = 5.008270e-17

But in your log, things go weird after the first 2:

residual_ratio = 1.161487e-18
residual_ratio = 1.708076e-19
residual_ratio = 1.535039e-01
residual_ratio = 1.531841e-01
residual_ratio = 1.528650e-01
residual_ratio = 1.525466e-01
residual_ratio = 1.522288e-01
residual_ratio = 1.519116e-01
residual_ratio = 1.515952e-01
residual_ratio = 1.512794e-01
residual_ratio = 1.509642e-01
residual_ratio = 1.506497e-01
residual_ratio = 1.503359e-01
residual_ratio = 1.500227e-01
Iterative refinement failed with residual_ratio = 1.500227e-01
...

Maybe increasing spral_print_level will give some insight.

bharswami commented 12 months ago

Running it with mumps (from the available Matlab interface for ipopt) works fine. I will dig more into this residual ratio.

bharswami commented 12 months ago

Do you where to look for the bug in SPRAL? SPRAL needs the METIS package.

svigerske commented 12 months ago

Not really. If there was an issue with linking blas to ipopt, maybe something similar happened with your build of Spral. As said, increasing the print level for spral may give some hint what is going on there.

bharswami commented 12 months ago

I tried increasing the print level, but I didn't see any increase in the print output. I am attaching in case you are able to spot any. ipopt_log_x_spral_print.txt

bharswami commented 12 months ago

What does the residual_ratio exactly indicate?

svigerske commented 12 months ago

I tried increasing the print level, but I didn't see any increase in the print output.

The output from Spral isn't going throught the message handling of Ipopt, I think. So it goes to stdout. Maybe there is some console where something is printed to.

What does the residual_ratio exactly indicate?

It is the scaled violation of the equation system that has been solved by the linear solver (Spral) (follow code at https://github.com/coin-or/Ipopt/blob/stable/3.14/src/Algorithm/IpPDFullSpaceSolver.cpp#L246-L249). So after solving Ax=b, it computes ||Ax-b|| and then divides this by some quantity in case A, x, or b are huge.

The scaling doesn't matter so much here. The lines around this in your log looks like

max-norm resid_x  1.422677e+00
max-norm resid_s  5.551115e-17
max-norm resid_c  0.000000e+00
max-norm resid_d  0.000000e+00
max-norm resid_zL 0.000000e+00
max-norm resid_zU 0.000000e+00
max-norm resid_vL 0.000000e+00
max-norm resid_vU 0.000000e+00
nrm_rhs = 4.53e+00 nrm_sol = 7.76e-01 nrm_resid = 1.42e+00
residual_ratio = 2.683456e-01

resid_x is one component of Ax-b and it is just too high, indicating that the equation system wasn't solved correctly.

bharswami commented 12 months ago

I fixed some bugs w.r.t. interfacing blas routines with ipopt. I tried the following problem: opts = []; funcs = []; opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'none'; opts.ipopt.spral_ignore_numa = 'no'; opts.ipopt.jac_d_constant = 'no'; opts.ipopt.hessian_constant = 'no'; opts.ipopt.max_iter = 1000; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 12; opts.ipopt.linear_solver = 'spral'; opts.lb = [-10;-10]; opts.ub = [10;10]; opts.cl = -10; opts.cu = 10; funcs.objective = @(x) sin(x(1)x(2)-8)^2; funcs.gradient = @(x) sin(2x(1)x(2)-16)[x(2) x(1)]; funcs.constraints = @(x) x(1); funcs.jacobian = @(x) sparse([1 0]); funcs.jacobianstructure = @(x) sparse([1 1]); funcs.hessian = @(x) sparse([0 0;0 0]); funcs.hessianstructure = @(x) sparse([1 1;1 1]); [x,fval] = ipopt_libs([1;1],funcs,opts); It is giving an "Optimal Solution Found", but it takes too many iterations and residual ratio is still going away from zero before iterative refinements. Log file attached. Do you have any idea why? ipopt__sine_log.txt

bharswami commented 12 months ago

Another simple problem, where the point is almost feasible but there is an error in calling the restoration phase. opts.ipopt.spral_use_gpu = 'no'; opts.ipopt.spral_pivot_method = 'block'; opts.ipopt.spral_order = 'metis'; opts.ipopt.spral_print_level = 0; opts.ipopt.spral_scaling = 'none'; opts.ipopt.spral_ignore_numa = 'no'; opts.ipopt.jac_d_constant = 'no'; opts.ipopt.hessian_constant = 'no'; opts.ipopt.max_iter = 1000; opts.ipopt.tol = 1e-6;% opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6} opts.ipopt.hessian_approximation = 'limited-memory'; opts.ipopt.derivative_test = 'first-order'; opts.ipopt.print_level = 5; opts.ipopt.linear_solver = 'spral'; opts.lb = [-10;-10]; opts.ub = [10;10]; opts.cl = -10; opts.cu = 10; funcs.objective = @(x) sin(x(1)x(2)^2-8)^2; funcs.gradient = @(x) sin(2x(1)x(2)^2-16)[x(2)^2 2x(2)x(1)]; funcs.constraints = @(x) x(1); funcs.jacobian = @(x) sparse([1 0]); funcs.jacobianstructure = @(x) sparse([1 1]); funcs.hessian = @(x) sparse([0 0;0 0]); funcs.hessianstructure = @(x) sparse([1 1;1 1]); [x,fval] = ipopt_libs([1;1],funcs,opts); ipopt_log_resto_err.txt

svigerske commented 12 months ago

I still cannot reproduce. Maybe you want to try without Matlab first and use the C++ interface of Ipopt directly. Just to reduce the complexity of the setup, and being able to see the Spral log.

Also, you can try switching to Mumps again. If that works, then just stay with Mumps. Or see if you can use HSL codes.

Here the log I get for the sin(x(1)x(2)-8)^2 objective function: ipopt__sine_log_cpp.txt It solves in 6 iterations.

And here the source file for that example: cpp_src.zip

bharswami commented 12 months ago

This is with SPRAL solver? I am able to solve more complex systems with the same number of iterations as the MUMPS, MA57 solvers (examples given in mexipopt) I really want to try to build this with SPRAL and get a mex file ready - I need that experience.

bharswami commented 12 months ago

I will try with cpp interface and see if I am able to get the desired output. The cpp source you sent is for the cpp interface of ipopt?

bharswami commented 12 months ago

My CPP test did not go well. Although the project built, I keep getting errors that some dll's are missing. I did a text comparison of the log files that I have with the one you sent me. Lines 2600-2800 are a bit weird. There seem to be some extra data from your log. The values also start to differ here. Could you point out why the values are different? Thanks ipopt__sine_log_cpp_stefan.txt ipopt_log_new.txt

svigerske commented 12 months ago

The first significant difference is around line 2784, in the approximated Hessian. I got

    W-U[ 1][    1]=-1.4490164783310873e+00
    W-U[ 1][    2]=-1.3681535825665503e+00

and you got

    W-U[ 1][    1]=-3.2039340808998262e-01
    W-U[ 1][    2]=-1.7771074456272162e-01

But I cannot say whether this is a result of the tiny difference before, or something was calculated wrongly.

bharswami commented 12 months ago

What about line 2776 (my code) and 2765 (your code)? The "homogenous element, all elements have value....." value is different. I found that the BFGS update (function calling IpLapackPotrf) fails on some cholesky factorizations. Is this a significant factor?

bharswami commented 12 months ago

I find that the matrix sent for cholesky factorization (to "dpotrf") has some "junk" values which should have been set to zero. This has to be done while initializing the matrix. But where? For example in IpLowRankAugSystemSolver.cpp Line 323: bool retchol = J1->ComputeCholeskyFactor(*M1); Line 387: bool retchol = J2->ComputeCholeskyFactor(*M2);

svigerske commented 11 months ago

What about line 2776 (my code) and 2765 (your code)? The "homogenous element, all elements have value....." value is different. I found that the BFGS update (function calling IpLapackPotrf) fails on some cholesky factorizations. Is this a significant factor?

Yes. Cholesky factorizations failing doesn't sound good. Which could then lead to differences in the homogenous matrix afterwards.

svigerske commented 11 months ago

I find that the matrix sent for cholesky factorization (to "dpotrf") has some "junk" values which should have been set to zero. This has to be done while initializing the matrix. But where? For example in IpLowRankAugSystemSolver.cpp Line 323: bool retchol = J1->ComputeCholeskyFactor(*M1); Line 387: bool retchol = J2->ComputeCholeskyFactor(*M2);

Initializing an empty dense matrix or vector probably happens via IpBlasCopy, e.g.,

   const Number zero = 0.;
   IpBlasCopy(NCols() * NRows(), &zero, 0, values_, 1);

puts NCols() * NRows() many zeros into array values_.

bharswami commented 11 months ago

Hi Stefan. There was a small niggle in interfacing with the BLAS routines in FORTRAN. Works fine now.