coin-or / Ipopt

COIN-OR Interior Point Optimizer IPOPT
https://coin-or.github.io/Ipopt
Other
1.43k stars 284 forks source link

Ipopt with Intel MKL issue #381

Closed koutoui closed 4 years ago

koutoui commented 4 years ago

Hi all,

I'm not sure if this is the right place to ask this. If it isn't, feel free to close this issue, and if you can, please redirect me to the right place. I've tried to build and run IPOPT with intel mkl pardiso, to take advantage of the parallel optimizations of the mkl pardiso solvers. I've used the already compiled binary that is in the release 3.13.2 in this repo (for windows) and it seems to be working with mkl_sequential. I want to use the parallel version. I've changed the configure script to link to mkl_rt instead of the sequential library and managed to build ipopt (on centos7) linked with the right libraries. However, when I'm running experiments, I see the cores jumping interchangeably between 0% to 100%, which kind of makes it even slower than the sequential implementation. I've also noticed that the total CPU spent in Ipopt without function evaluations is more than what it takes the whole IPOPT App OptimizeTNLP() function. Does Ipopt only support the sequential version of intel mkl and it's up to the user to spawn new threads, or is it something else going on? Feel free to ask me any clarifying questions if this description is not clear enough

LOGS: This is Ipopt version 3.13.2, running with linear solver pardiso.

Number of nonzeros in equality constraint Jacobian...: 66018 Number of nonzeros in inequality constraint Jacobian.: 401092 Number of nonzeros in Lagrangian Hessian.............: 999052

Total number of variables............................: 150084 variables with only lower bounds: 0 variables with lower and upper bounds: 0 variables with only upper bounds: 0 Total number of equality constraints.................: 22008 Total number of inequality constraints...............: 66849 inequality constraints with only lower bounds: 66849 inequality constraints with lower and upper bounds: 0 inequality constraints with only upper bounds: 0

iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 0 7.9675812e+02 1.48e+00 2.17e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0 1 8.5640400e+02 6.96e-09 3.07e+00 -1.0 3.71e+01 - 3.18e-01 1.00e+00f 1 2 1.0489324e+03 4.63e-11 1.17e+00 -1.0 3.89e+01 - 6.34e-01 1.00e+00f 1 3 1.6132220e+03 2.17e-11 5.32e-01 -1.0 5.53e+01 - 5.43e-01 1.00e+00f 1 4 1.2160557e+03 1.73e-11 2.34e-01 -1.7 4.44e+01 - 7.52e-01 1.00e+00f 1 5 1.0862365e+03 4.26e-11 1.33e-01 -1.7 1.93e+01 - 6.85e-01 1.00e+00f 1 6 8.6618266e+02 1.63e-11 4.56e-02 -2.5 3.97e+01 - 8.44e-01 1.00e+00f 1 7 8.2245031e+02 3.25e-12 1.19e-02 -2.5 1.27e+01 - 8.67e-01 1.00e+00f 1 8 8.0446445e+02 2.46e-12 1.13e-02 -3.8 1.21e+01 - 5.77e-01 6.89e-01f 1 9 8.0017886e+02 1.02e-10 3.16e-03 -3.8 5.32e+00 - 6.13e-01 5.34e-01f 1 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 10 7.9870068e+02 6.07e-11 1.34e-02 -3.8 2.97e+00 - 7.57e-01 4.07e-01f 1 11 7.9757434e+02 2.44e-11 1.32e-02 -3.8 1.95e+00 - 1.00e+00 5.98e-01f 1 12 7.9701097e+02 2.81e-09 8.31e-07 -3.8 8.54e-01 - 1.00e+00 1.00e+00f 1 13 7.9685933e+02 1.38e-09 7.33e-03 -5.7 8.58e-01 - 8.42e-01 5.10e-01f 1 14 7.9679180e+02 5.59e-10 4.71e-03 -5.7 5.85e-01 - 8.41e-01 5.95e-01f 1 15 7.9676961e+02 1.89e-10 3.13e-03 -5.7 3.25e-01 - 1.00e+00 6.62e-01f 1 16 7.9676278e+02 2.20e-07 1.11e-03 -5.7 1.49e-01 - 1.00e+00 7.79e-01f 1 17 7.9676114e+02 2.84e-14 1.84e-11 -5.7 4.73e-02 - 1.00e+00 1.00e+00f 1 18 7.9676020e+02 2.84e-14 3.23e-04 -8.6 3.68e-02 - 9.82e-01 7.32e-01f 1 19 7.9675992e+02 1.14e-12 1.20e-04 -8.6 2.37e-02 - 1.00e+00 8.12e-01f 1 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls 20 7.9675986e+02 4.50e-14 7.17e-06 -8.6 1.25e-02 - 1.00e+00 9.67e-01f 1 21 7.9675985e+02 2.84e-14 2.99e-13 -8.6 4.92e-03 - 1.00e+00 1.00e+00f 1 22 7.9675985e+02 2.84e-14 2.01e-13 -9.0 1.93e-03 - 1.00e+00 1.00e+00h 1

Number of Iterations....: 22

                               (scaled)                 (unscaled)

Objective...............: 7.9675985383414275e+02 7.9675985383414275e+02 Dual infeasibility......: 2.0089825552524002e-13 2.0089825552524002e-13 Constraint violation....: 2.8421709430404007e-14 2.8421709430404007e-14 Complementarity.........: 4.6163345254284267e-09 4.6163345254284267e-09 Overall NLP error.......: 4.6163345254284267e-09 4.6163345254284267e-09

Number of objective function evaluations = 23 Number of objective gradient evaluations = 23 Number of equality constraint evaluations = 23 Number of inequality constraint evaluations = 23 Number of equality constraint Jacobian evaluations = 1 Number of inequality constraint Jacobian evaluations = 1 Number of Lagrangian Hessian evaluations = 1 Total CPU secs in IPOPT (w/o function evaluations) = 129.221 Total CPU secs in NLP function evaluations = 1.364

EXIT: Optimal Solution Found. IPOPT App OptimizeTNLP(np) took 87.382109 s.

*** IPOPT: The problem solved in 22 iterations!

*** IPOPT: The final value of the objective function is 796.75985383414275

svigerske commented 4 years ago

I doubt that I can help much. There isn't much of support :).

If the time you measured for OptimizeTNLP is the wall-clock time and MKL was running in parallel, i.e., using several CPU cores, then the total CPU seconds are typically larger than the wall-clock time. The CPU seconds aren't really useful in this case and it may be nicer if Ipopt would print wall-clock time by default.

To get MKL use more threads, you will have to set the environment variable OMP_NUM_THREADS or MKL_NUM_THREADS, I believe. But since you report that several CPUs were utilized, that seems to have happened.

But on the actual question on whether Ipopt is supposed to run well with parallel Pardiso from Intel MKL, I cannot really comment. There is usually overhead when doing parallelization and it only pays off if the problem is large enough and parallelization is implemented well. There needs to be enough work to be done in parallel that it compensates the additional overhead for maintaining the parallelization.