ERGO-Code / HiGHS

Linear optimization software
MIT License
986 stars 183 forks source link

Inexplicable performance variation #1547

Open NPC opened 11 months ago

NPC commented 11 months ago

Hello,

I'm really struggling to pinpoint this issue. When I call HiGHS solver from my application, for one specific (very simple!) problem, the optimisation time is much higher than expected. Yet if I try to debug or reproduce it, it works fine. Perhaps you could help me see where the issue is (…or not).

This is on Windows 11, calling via C# interface, but for debugging I call in the same conditions (from what I can tell!).

The input .mps file attached: too-slow-2023-12-11.zip

I've set log_dev_level to 3, but comparing the two log files I can't see any difference between one an another case (apart from the time, of course).

Here's the “slow” log:

Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Running with 8 thread(s)
Solving model: LP-problem
Presolving model
19 rows, 59 cols, 103 nonzeros
HPresolve::removeDependentEquations Got 7 equations, checking for dependent equations
HPresolve::removeDependentEquations Removed 0 rows and 0 nonzeros
15 rows, 45 cols, 82 nonzeros
presolve_.run() returns status: Reduced
Presolve : Reductions: rows 15(-6); columns 45(-16); elements 82(-29)
Solving the presolved LP
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
Cost perturbation for LP-problem
   Initially have 45 nonzero costs (100%) with min / average / max = 0.4 / 73.9265 / 256
   Large so set max_abs_cost = sqrt(sqrt(max_abs_cost)) = 4
   Perturbation column base = 2e-06
   Perturbation row    base = 1e-12
dual-phase-1-start
       Iteration        Objective     Infeasibilities num(sum)
DuPh1          0    -1.1799937799e+00 Ph1: 1(1); Du: 1(1.17999) No reason
       Iteration        Objective    
DuPh1          1     0.0000000000e+00
       Iteration        Objective     Infeasibilities num(sum)
DuPh1          1     0.0000000000e+00 Ph1: 0(0) Possibly optimal
dual-phase-1-optimal
dual-phase-2-start
       Iteration        Objective     Infeasibilities num(sum)
DuPh2          1    -5.9881191554e+07 Pr: 15(3.39494e+06) No reason
       Iteration        Objective    
DuPh2          2    -5.8700226771e+07
DuPh2          3    -5.4798309406e+07
DuPh2          4    -5.1449702811e+07
DuPh2          5    -4.5085582062e+07
DuPh2          6    -4.3351304830e+07
DuPh2          7    -3.9409862782e+07
DuPh2          8    -3.6631797483e+07
DuPh2          9    -3.3317519821e+07
DuPh2         10    -3.0530258413e+07
DuPh2         11    -3.0087393072e+07
DuPh2         12    -3.0087391906e+07
DuPh2         13    -2.9704527676e+07
DuPh2         14    -2.8421173722e+07
DuPh2         15    -2.8421170942e+07
DuPh2         16    -2.8421162628e+07
DuPh2         17    -2.7592579526e+07
DuPh2         18    -2.2446835277e+07
DuPh2         19    -2.2446824310e+07
DuPh2         20    -2.2446823837e+07
DuPh2         21    -2.2446823808e+07
DuPh2         22    -2.2446823148e+07
DuPh2         23    -2.2446822801e+07
DuPh2         24    -1.9396814345e+07
DuPh2         25    -1.7078950457e+07
DuPh2         26    -1.7078950122e+07
DuPh2         27    -1.7078950076e+07
DuPh2         28    -1.7078950161e+07
DuPh2         29    -1.7078949906e+07
DuPh2         30    -1.7078949798e+07
       Iteration        Objective     Infeasibilities num(sum)
DuPh2         30    -1.7078949798e+07 Pr: 0(0); Du: 0(8.83463e-06) Possibly optimal
dual-phase-2-optimal
dual-cleanup-shift
       Iteration        Objective     Infeasibilities num(sum)
DuPh2         30    -1.7078915498e+07 Pr: 0(0); Du: 2(2.95116e-05) Perturbation cleanup
HEkkDual:: Using primal simplex to try to clean up num / max / sum = 2 / 1.01069e-05 / 2.95116e-05 dual infeasibilities
Primal feasible and num / max / sum dual infeasibilities of 2 / 1.01069e-05 / 2.95116e-05, so near-optimal
Near-optimal, so don't use bound perturbation
primal-phase2-start
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         30    -1.7078915498e+07 Pr: 0(0); Du: 2(2.95116e-05) No reason
       Iteration        Objective    
PrPh2         31    -1.7078915498e+07
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         31    -1.7078915498e+07 Pr: 0(0); Du: 0(1.4097e-05) Possibly optimal
primal-phase-2-optimal
problem-optimal
Simplex iterations: DuPh1 1; DuPh2 29; PrPh2 1; Total 31
EKK dual simplex solver returns 0 primal and 0 dual infeasibilities: Status Optimal
Have num/max/sum primal (0/0/0) and dual (1/0.00115163/0.00115178) unscaled infeasibilities
Using EKK primal simplex solver
primal-phase2-start
Value(  27) =    -0.5906 exceeds lower = -3.437e-07 by    0.5906, so shift bound by    0.5906 to    -0.5906: infeasibility -1.687e-05 with error 4.91042e-17
Value(   1) =    -0.3787 exceeds lower = -4.538e-07 by    0.3787, so shift bound by    0.3787 to    -0.3787: infeasibility -1.908e-05 with error 1.31324e-17
Value(  22) =  1.226e+06 exceeds upper =  1.226e+06 by    0.3125, so shift bound by    0.3125 to  1.226e+06: infeasibility -1.026e-05 with error 3.75305e-12
phase2CorrectPrimal: num / max / sum primal corrections = 3 / 0.590639 / 1.28185
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         31    -1.7078916544e+07 Pr: 0(0); Du: 1(0.00115178) No reason
       Iteration        Objective    
PrPh2         32    -1.7078916559e+07
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         32    -1.7078916559e+07 Pr: 0(0); Du: 0(3.25726e-07) Possibly optimal
primal-phase-2-optimal
primal-cleanup-shift
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         32    -1.7078915513e+07 Pr: 0(0); Du: 0(3.25726e-07) Perturbation cleanup
problem-optimal
Simplex iterations: PrPh2 1; Total 1
EKK primal simplex solver returns 0 primal and 0 dual infeasibilities: Status Optimal
Postsolve finished
Solving the original LP from the solution after postsolve
Using EKK primal simplex solver
primal-phase2-start
Value(  64) = -4.301e+04 exceeds lower = -4.301e+04 by  0.008659, so shift bound by  0.008674 to -4.301e+04: infeasibility -1.442e-05 with error 2.79149e-12
Value(  38) =  -0.002573 exceeds lower = -7.635e-08 by  0.002573, so shift bound by  0.002585 to  -0.002585: infeasibility -1.153e-05 with error 1.01644e-19
Value(   1) =  -0.005225 exceeds lower = -6.322e-08 by  0.005225, so shift bound by  0.005236 to  -0.005236: infeasibility -1.126e-05 with error 1.66018e-19
Value(  33) =  1.916e+04 exceeds upper =  1.916e+04 by 0.0008196, so shift bound by 0.0008384 to  1.916e+04: infeasibility -1.882e-05 with error 8.93627e-13
phase2CorrectPrimal: num / max / sum primal corrections = 4 / 0.00867361 / 0.0173325
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         32    -1.7078917684e+07 Pr: 0(0); Du: 8(695.04) No reason
Value(  14) =  -0.005774 exceeds lower = -4.631e-07 by  0.005773, so shift bound by  0.005792 to  -0.005793: infeasibility -1.926e-05 with error 1.32137e-19
       Iteration        Objective    
PrPh2         33    -1.7078916369e+07
Value(  15) = -0.0007562 exceeds lower = -2.854e-07 by 0.0007559, so shift bound by 0.0007716 to -0.0007719: infeasibility -1.571e-05 with error 1.35525e-20
PrPh2         34    -1.7078916196e+07
PrPh2         35    -1.7078917268e+07
PrPh2         36    -1.7078917300e+07
PrPh2         37    -1.7078917325e+07
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         37    -1.7078917325e+07 Pr: 0(0); Du: 0(3.05052e-05) Possibly optimal
primal-phase-2-optimal
primal-cleanup-shift
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         37    -1.7078915571e+07 Pr: 0(0); Du: 0(3.05052e-05) Perturbation cleanup
problem-optimal
Simplex iterations: PrPh2 5; Total 5
EKK primal simplex solver returns 0 primal and 0 dual infeasibilities: Status Optimal
Have num/max/sum primal (0/0/0) and dual (2/0.000822368/0.00164502) unscaled infeasibilities
Using EKK primal simplex solver
Primal feasible and num / max / sum dual infeasibilities of 2 / 0.000822368 / 0.00164502, so near-optimal
Near-optimal, so don't use bound perturbation
primal-phase2-start
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         37    -1.7078915571e+07 Pr: 0(0); Du: 2(0.00164502) No reason
       Iteration        Objective    
PrPh2         38    -1.7078915571e+07
       Iteration        Objective     Infeasibilities num(sum)
PrPh2         38    -1.7078915571e+07 Pr: 0(0); Du: 0(4.32342e-07) Possibly optimal
primal-phase-2-optimal
problem-optimal
Simplex iterations: PrPh2 1; Total 1
EKK primal simplex solver returns 0 primal and 0 dual infeasibilities: Status Optimal
Postsolve  : 6
Time       :    90.00
Time Pre   :     0.00
Time PreLP :    89.99
Time PostLP:     0.00
For LP       LP-problem: Presolve     0.00 (  0%): Solve presolved LP    89.99 ( 99%): Solve original LP     0.00 (  0%)
Model   status      : Optimal
Simplex   iterations: 38
Objective value     : -1.7078915571e+07
HiGHS run time      :         90.00

Via file compare, the only difference with the “fast” (debug) log is this:

Time       :     0.45
Time Pre   :     0.02
Time PreLP :     0.23
Time PostLP:     0.20
For LP       LP-problem: Presolve     0.02 (  3%): Solve presolved LP     0.23 ( 49%): Solve original LP     0.20 ( 45%)
Model   status      : Optimal
Simplex   iterations: 38
Objective value     : -1.7078915571e+07
HiGHS run time      :          0.47

I'm quite puzzled by this, and would greatly appreciate any feedback.

jajhall commented 11 months ago

I've downloaded your example, and it solves in no time from the command line - which is what you would also expect, given your comments. My only observation was that on a none-too-new PC, the run time was 0.01s, rather than your 0.45s

Looking at the output for your "rogue" run, I observe that the 32 iterations of parallel dual simplex required to solve the presolved LP take 89.99s, but the 6 iterations of serial primal simplex required to solve the original LP from the optimal solution of the presolved LP take "0.00" seconds. This makes me think that there's an issue with multithreading.

Running the dual simplex solver in parallel - particularly for a problem of this size - is of questionable value so try omitting parallel=on option (however you set it).

Of course the "debug" run also uses parallel dual simplex. When you say "debug", I guess you're compiling with the debugging flag (by setting cmake -DCMAKE_BUILD_TUPE=debug) otherwise I don't see the difference in the "debug" run

Unfortunately, if this rogue behaviour is due to calling from C# then we can't help. The HiGHS C# API was written by someone else, and I don't even know how to run HiGHS via C#

NPC commented 11 months ago

@jajhall thank you for your response and investigation. Turning parallel off doesn't help. Reverting to 1.5.3 doesn't help either. I agree that it looks like an environment issue, but I can't figure out what.

By debug I mean running from our webapplication vs running from C# code in the Visual Studio (so it's the same HiGHS DLL, not a different “debug” version). Which shouldn't have much difference either (I compare the .mps we generate etc), but obviously something is different.

Thanks again for trying to help, this is something on my side, and I don't even know how to keep looking. I doubt you'll be able to help further at this point, so you can close the issue.

jajhall commented 11 months ago

Thanks. Sorry I can't help, but it does sound like something at your end.

NPC commented 10 months ago

@jajhall hello again, I'm not sure if this qualifies as a new issue, plus I'm very aware that this is likely something with how the library is called and multithreading, but I hope for some advice. I've ran accross another example where HiGHS exe solves a problem in seconds, but calling from C# seems to take forever (I simply stopped waiting).

Yet, disabling parallel works, and I get a quick solution. This is something I'll have to do for now, even though for our larger problems parallel was showing a significant improvement, and was one of the reasons we went with HiGHS.

I'm not uploading .mps yet (should I?), but perhaps you'll be able to recommend what to investigate in this situation. Here's the log for the “endless” case:

Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Presolving model
145 rows, 107305 cols, 745933 nonzeros
145 rows, 107214 cols, 745842 nonzeros
Presolve : Reductions: rows 145(-117); columns 107214(-99); elements 745842(-216)
Solving the presolved LP
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 0s
        321    -9.8922096347e+06 Pr: 0(0); Du: 0(9.7114e-09) 2s
        321    -9.8922096347e+06 Pr: 0(0); Du: 0(9.7114e-09) 2s
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58177e-11) 2s
WARNING:    Increasing Markowitz threshold to 0.5
        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58191e-11) 7s
        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58191e-11) 12s
...

…Then 321 repeats for as long as I waited (over an hour). Does this repeating iteration number mean something? Is the warning significant? And why is the “Using EKK parallel dual simplex solver - PAMI with concurrency of 8” stated twice in the log?

Here's the successful log, with parallel off, quite a contrast (although “Using EKK dual simplex solver - serial” line is still repeated):

Running HiGHS 1.6.0: Copyright (c) 2023 HiGHS under MIT licence terms
Presolving model
145 rows, 107305 cols, 745933 nonzeros
145 rows, 107214 cols, 745842 nonzeros
Presolve : Reductions: rows 145(-117); columns 107214(-99); elements 745842(-216)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 0s
        350    -9.8922096347e+06 Pr: 0(0); Du: 0(7.6456e-09) 2s
        350    -9.8922096347e+06 Pr: 0(0); Du: 0(7.6456e-09) 2s
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
        350    -9.8922096347e+06 Pr: 9(0.0021); Du: 0(3.96555e-11) 2s
        390    -9.8922096347e+06 Pr: 0(0); Du: 0(3.23401e-05) 2s
Solving the original LP from the solution after postsolve
Using EKK primal simplex solver
  Iteration        Objective     Infeasibilities num(sum)
        390    -9.8922096347e+06 Pr: 0(0); Du: 17(0.000294) 2s
        392    -9.8922096347e+06 Pr: 0(0); Du: 0(2.94003e-05) 2s
Model   status      : Optimal
Simplex   iterations: 392
Objective value     : -9.8922096347e+06
HiGHS run time      :          2.47

Again, I understand that this is not a HiGHS issue in itself, but looking at the log differences I wonder if you could make some experienced observations. Thank you!

PS (sorry if this is too much information!) With log_dev_level=3, I see a different log pattern starting around iteration 315… and notice how iteration 321 gets into endless loop to 329 and back:

DuPh2        315    -9.8922544906e+06
       Iteration        Objective     Infeasibilities num(sum)
DuPh2        315    -9.8922544942e+06 Pr: 0(0); Du: 0(9.59158e-06) Synthetic clock
dual-phase-2-optimal
dual-cleanup-shift
       Iteration        Objective     Infeasibilities num(sum)
DuPh2        315    -9.8922094863e+06 Pr: 0(0); Du: 11(0.00526849) Perturbation cleanup
HEkkDual:: Using primal simplex to try to clean up num / max / sum = 11 / 0.00150528 / 0.00526849 dual infeasibilities
primal-phase2-start
       Iteration        Objective     Infeasibilities num(sum)
PrPh2        315    -9.8922094863e+06 Pr: 0(0); Du: 11(0.00526849) No reason
       Iteration        Objective    
PrPh2        316    -9.8922094863e+06
PrPh2        317    -9.8922095206e+06
PrPh2        318    -9.8922095231e+06
PrPh2        319    -9.8922095921e+06
PrPh2        320    -9.8922095921e+06
PrPh2        321    -9.8922096347e+06
       Iteration        Objective     Infeasibilities num(sum)
PrPh2        321    -9.8922096347e+06 Pr: 0(0); Du: 0(9.7114e-09) Possibly optimal
primal-phase-2-optimal
problem-optimal
Simplex iterations: DuPh2 315; PrPh2 6; Total 321
EKK dual simplex solver returns 0 primal and 0 dual infeasibilities: Status Optimal
Have num/max/sum primal (11/0.000985222/0.00263214) and dual (0/6.62359e-12/4.58177e-11) unscaled infeasibilities
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
Dual feasible with unperturbed costs and num / max / sum primal infeasibilities of 11 / 0.000985222 / 0.00263214, so near-optimal
Near-optimal, so don't use cost perturbation
dual-phase-2-start
       Iteration        Objective     Infeasibilities num(sum)
DuPh2        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58177e-11) No reason
       Iteration        Objective    
DuPh2        322    -9.8922096347e+06
DuPh2        323    -9.8922096347e+06
DuPh2        324    -9.8922096347e+06
DuPh2        325    -9.8922096347e+06
DuPh2        326    -9.8922096347e+06
DuPh2        327    -9.8922096347e+06
DuPh2        328    -9.8922096347e+06
       Iteration        Objective     Infeasibilities num(sum)
DuPh2        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58191e-11) Possibly singular basis
DuPh2        322    -9.8922096347e+06
DuPh2        323    -9.8922096347e+06
DuPh2        324    -9.8922096347e+06
DuPh2        325    -9.8922096347e+06
DuPh2        326    -9.8922096347e+06
DuPh2        327    -9.8922096347e+06
DuPh2        328    -9.8922096347e+06
DuPh2        329    -9.8922096347e+06
WARNING:    Increasing Markowitz threshold to 0.5
       Iteration        Objective     Infeasibilities num(sum)
DuPh2        321    -9.8922096347e+06 Pr: 11(0.00263214); Du: 0(4.58191e-11) Possibly singular basis
DuPh2        322    -9.8922096347e+06
...

(and so on and on and on)

galabovaa commented 3 months ago

Would you mind sharing the MPS file, particularly if you observe this issue when you debug in VSCode? I am wondering if I can replicate the hanging behaviour if I try to solve the MPS file from our C++ generated executable directly. The occasional hanging may be related to the threads bug on Windows, which is now observed both from Julia and Python and now C#. It must be something in the C++ code underneath the wrappers and it would be very helpful if I can replicate this

NPC commented 3 months ago

@galabovaa thank you for looking into this! Here's the mps file that caused this issue back in January: LP-problem.zip

Please note that since then we've changed our logic to calling highs.exe instead of calling the DLL from C#, as the DLL proved to be too unreliable for production usage. The .exe has its issues too, as you noted in the other ticket.