ERGO-Code / HiGHS

Linear optimization software
MIT License
869 stars 166 forks source link

Julia binaries of HiGHS hang after solving a simple LP on Windows #1044

Open jajhall opened 1 year ago

jajhall commented 1 year ago

I've just downloaded the v1.4.0 Windows binaries, and reproduced the errant behaviour: occasional "hanging" after HiGHS has solved the problem (forrest6). Yet I don't get it for (say) v1.2.1

I can't recreate it with a local build of HiGHS v1.4.0, which is unfortunate, as debugging via creation of the binaries is not possible.

pjaborges commented 1 year ago

I experienced the same. Windows 7 enterprise always good. It stalks sometimes on windows 10 enterprise.

jajhall commented 1 year ago

Thanks: that its behaviour varies from one Windows variant to another seems only to add to the difficulty of tracking down what's happening.

For the other user I've suggested to use the driver

https://github.com/ERGO-Code/HiGHS/blob/master/app/RunHighs.cpp

to create a local version of the command-line executable. It would have to be compiled, and then linked to the static library that comes with the executable that's failing to terminate. However, others are using the static library successfully to call HiGHS from their own C++ code.

Another work-around is to build HiGHS from source locally. Once CMake and the C++ compiler are set up properly, this works fine.

Finally, for the Python-oriented, HiGHS is in PyPI so HiGHS can be run using

pip install highspy

and then writing a Python driver. Note that it may be necessary to update "pip", and even "python"

pjaborges commented 1 year ago

I did some more tests (win 7) that may help track this:

With this options setup for file output I get the message and doesn't terminate, even though the file is printed correctly in the selected directory. image

With this setting, the file is also printed in the selected directory but it does not terminate. image

If no options for solution output are used it works properly (but fails in win10 as mentioned previously).

pjaborges commented 1 year ago

It seems to me that there is definitely something hanging highs to not terminate. I call highs.exe from my console app in c# by starting a process with arguments. The issues above disappear once I call the dispose method (free resources used by the process) on the process. Maybe the connection to the output file??

guifcoelho commented 1 year ago

Hello there! I had the same issue with the static executable for Windows but the one with shared libraries worked fine.

jajhall commented 1 year ago

Thanks for your observations @pjaborges and @guifcoelho. The last "action" in https://github.com/ERGO-Code/HiGHS/blob/master/app/RunHighs.cpp writes out the model being solved if write_model_to_file=true

When write_model_to_file=true and the static executable hangs, the model is written OK. Since it also hangs occasionally when write_model_to_file=false, it seems safe to assume that HiGHS reaches line 83 of RunHighs.cpp.

This isn't going to be fixed in the short term, so the best advice for Windows users for whom HiGHS hangs in this way is to create their own executable by compiling and linking RunHighs.cpp.

jajhall commented 1 year ago

Comments on #1137 give hope of fixing this

jannicklange commented 1 year ago

I also stumbled over this in HiGHS 1.4.2. I use the following options file:

write_solution_style=4
log_file=Highs_logs\2023-04-14_15-24-50-682_highs_0.log

I started Highs as an external process within a C# applicatoin. Sometimes the process would not exit. The workaround that I used was to keep looking for the solution file, and kill the HiGHS process once the solution file was no longer accessed by HiGHS. I.e. call this code in a while-loop (with some sensible Task.Delay) until no exception is thrown :D

try
{
    using (var dummy = fileName.Open(FileMode.Open, FileAccess.Read, FileShare.None))
    {

    }
}
catch
{
    this._logger.Log(LogLevel.Debug, $"Accessed file while it was still opened by HiGHS");
    continue;
}
jajhall commented 1 year ago

Oh dear @jannicklange, I'm sorry you've had to do something so ugly. Can you not set threads=1 and prevent this?

I think we're going to have to set something like threads=4 by default, and modify it internally if it (or the value set by a user) exceeds half the threads available

e-zaline commented 10 months ago

Hello, I seem to have a similar problem.

     546484     2.9477613845e+04 Pr: 0(0); Du: 1948(0.458696) 1713s
     548358     2.9475743023e+04 Pr: 0(0); Du: 0(1.90613e-09) 1722s
     548358     2.9475743023e+04 Pr: 0(0); Du: 0(1.90613e-09) 1722s
WARNING: Number of threads available = 4 < 8 = Simplex concurrency to be used: Parallel performance may be less than anticipated
Using EKK parallel dual simplex solver - PAMI with concurrency of 8
  Iteration        Objective     Infeasibilities num(sum)
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38972e-09) 1724s
WARNING:    Increasing Markowitz threshold to 0.5
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1774s
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1825s
     548358     2.9475743023e+04 Pr: 5(0.00227596); Du: 0(4.38508e-09) 1875s

I am using Windows 11. I can provide an example if need be. Thank you!

jajhall commented 10 months ago

Thanks, but it's easy to reproduce. We just haven't got around to identifying an internal fix.

Just set 'threads=1'

odow commented 7 months ago

@jhay778 just encountered this working on OpenSolver. The fix of setting threads=1 seemed to fix it. It hung pretty frequently. It looked like some of the threads were not getting cleaned up properly after the solution file was written.

They were using https://github.com/JuliaBinaryWrappers/HiGHSstatic_jll.jl/releases/tag/HiGHSstatic-v1.6.0%2B0

jhay778 commented 7 months ago

Running highs.exe and observing in Process Monitor, seems that 9 threads are opened and only 2 are closed. I can force close through task manager, causing all threads to close and process to exit. It solves the input model and writes to the solution file, but just does not terminate correctly.

Setting threads = 1 bypasses the issue for me. Setting threads = 2 causes hanging infrequently. Setting threads = 3 or 4 causes hanging more frequently. Setting threads = 10 or 0 causes hanging essentially every call (most likely because only 9 threads are created when threads are unlimited anyway).

jajhall commented 7 months ago

Thanks, this is useful information in addressing this issue

NPC commented 2 weeks ago

@jhay778 (I know it was a while ago, so if you can remember) did you set threads via the command line, or via the parameters file? I tried the command line and get a “no such parameter” error.

PS This is still happening, v1.7.1 highs.exe on Win11. I hope to post more data once I've investigated more.

jajhall commented 1 week ago

No, threads cannot be set as a command line parameter, only in a file

https://ergo-code.github.io/HiGHS/stable/executable/#Command-line-options

NPC commented 1 week ago

@jajhall here are my observations, perhaps they'll help, but I understand that Windows is not your typical environment, so it's hard to solve issues you don't even observe.

The issue occurs on very simple LP problems, infrequently. In my stress test, it usually hangs just after 100s of iterations (adding a sleep between calls doesn't seem to help, as the issue linked in this discussion suggested). Here's the MPS I used in stress testing:

NAME          001-basic
ROWS
 N  _OBJ_
 L  R3
 L  R2
COLUMNS
    C1        _OBJ_     -.89000000000   R2        1.00000000000
    C1        R3        1.00000000000
RHS
    RHS1      R2       100.00000000000   R3       100.00000000000
BOUNDS
 UP BND1      C1        100.00000000000
ENDATA

It seems to only require presolve, here's the log (the “failed” iteration, I can't see any difference from the “good” ones):

Running HiGHS 1.7.1 (git hash: 43329e528): Copyright (c) 2024 HiGHS under MIT licence terms
LP   001-basic has 2 rows; 1 cols; 2 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [9e-01, 9e-01]
  Bound  [1e+02, 1e+02]
  RHS    [1e+02, 1e+02]
Presolving model
0 rows, 0 cols, 0 nonzeros  0s
0 rows, 0 cols, 0 nonzeros  0s
Presolve : Reductions: rows 0(-2); columns 0(-1); elements 0(-2) - Reduced to empty
Solving the original LP from the solution after postsolve
Model   status      : Optimal
Objective value     : -8.9000000000e+01
HiGHS run time      :          0.00

Setting parallel=off does NOT help, oddly, but threads=1 helps. So it's not even the main LP solver that causes the issue, but something more, ahem, infrastructural (which you likely already knew).

Also, on my desktop PC with 12 physical CPU cores (24 max threads) only threads=1 helps. But on my laptop with 8 cores (16 threads) it looks like threads=2 is stable. This is probably useless to you, but still odd.

PS In all of the above I'm using highs.exe, I switched to it from calling the DLL from C# after not being able to resolve https://github.com/ERGO-Code/HiGHS/issues/1547#issuecomment-1888139724. HiGHS instability on Windows is a concern, to be honest, but so far we've been able to find workarounds, and your advice is always highly appreciated.

PPS Have you considered adding timestamps to each .log line? It's not a big deal, but would help seeing when the last log entry was updated, plus relative timings between lines. For appended logs it would help identify individual runs (including date).