ERGO-Code / HiGHS

Linear optimization software
MIT License
922 stars 173 forks source link

Running in Parallel gives unexpected result. #1838

Open JeroenMNallian opened 2 months ago

JeroenMNallian commented 2 months ago

Hello, We run a MIP solver in Parallel inside a docker we sometimes get very suspious results where we suspect that we can't run this parallel. We have seen the note posted on the documenation that parallisem isn't supported yet but should be by end of 2024. Is there any time line of this?

For every call we use the same request and we get different output. W I have included some output of our application:

First output Failed to solve

Running HiGHS 1.7.0 (git hash: 50670fd): Copyright (c) 2024 HiGHS under MIT licence terms Coefficient ranges: Matrix [1e+00, 1e+00] Cost [1e+00, 1e+01] Bound [1e+00, 3e+00] RHS [1e+00, 1e+00] WARNING: 13 semi-continuous/integer variable(s) have zero lower bound so are continuous/integer Presolving model 274 rows, 372 cols, 452 nonzeros 0s 84 rows, 182 cols, 262 nonzeros 0s

Solving MIP model with: 84 rows 182 cols (5 binary, 4 integer, 1 implied int., 127 continuous) 262 nonzeros

    Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work
 Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

     0       0         0   0.00%   272             -inf                 inf        0      0      0         0     0.0s
     0       0         0   0.00%   272             -inf                 inf        0      0      0         0     0.0s

Symmetry detection completed in 0.0s No symmetry present

WARNING: Failed to solve node with all integer columns fixed. Declaring node infeasible. WARNING: Failed to solve node with all integer columns fixed. Declaring node infeasible. WARNING: Failed to solve node with all integer columns fixed. Declaring node infeasible. WARNING: Failed to solve node with all integer columns fixed. Declaring node infeasible. ... (a few 100 times) WARNING: Failed to solve node with all integer columns fixed. Declaring node infeasible.

Restarting search from the root node Model after restart has 77 rows, 167 cols (2 bin., 4 int., 1 impl., 119 cont.), and 242 nonzeros

  1089       0         0   0.00%   -nan            -nan               -nan%        0      0      0       150     0.5s

Solving report Status Optimal Primal bound nan Dual bound nan Gap 0% (tolerance: 0.01%) Solution status - Timing 0.45 (total) 0.00 (presolve) 0.00 (postsolve) Nodes 1090 LP iterations 226 (total) 0 (strong br.) 0 (separation) 150 (heuristics) ERROR: MIP solver claims optimality, but with num/max/sum primal(-1/inf/inf) infeasibilities

Corrupted double-linked list

274 rows, 370 cols, 450 nonzeros 0s 82 rows, 178 cols, 258 nonzeros 0s corrupted double-linked list

Corrupted memory

We get a exit code 139: memory violation.

CPU maxes

We also sometimes see the CPU maxing out at 100% and this wont come down until we kill the app.

Can you provide any guidance if we can use this tool to solve our problem or we should search somewhere else.

jajhall commented 2 months ago

Are you running multiple serial instances of the MIP solver in parallel? Or a single instance with multiple threads?

Are you able to export the model as an MPS file?

JeroenMNallian commented 2 months ago

We are using a ASP.net core application to serve this to a UI. So we run multiple serial instances in parallel. We create a new solver for every instance but share the dll. We dispose after every call.

How do you export the model as a MPS file? I did a quick search but couldn't find a example.

jajhall commented 2 months ago

In what language are you passing your model to HiGHS and (trying to) solve it?

That said, since you're running multiple serial instances in parallel, it suggests that it's not some problem with the particular instance.

How many instances have you run in parallel by the time this error is encountered?

There's no doubt that people are running multiple instances of HiGHS in parallel and not encountering any issues like this - which is consistent with us believing HiGHS to be thread safe. The only recent issue relating to running multiple instances was https://github.com/ERGO-Code/HiGHS/issues/1771 and this is fixed in v1.7.2.

JeroenMNallian commented 2 months ago

We use c# to pass the model to HIGHS.

We have created a new solver between once and >100 times before the error occures. There isn't really a definitive number. We also noticed this when running it serial (so request by request) it also gives the same error.

We will update to 1.7.2 and rerun our tests again. (Currently on 1.7.0)

jajhall commented 2 months ago

We use c# to pass the model to HIGHS.

In that case use Highs_writeModel

https://github.com/ERGO-Code/HiGHS/blob/5ce7a27531a7f4166ee5a8343169a1014febb41a/src/interfaces/highs_csharp_api.cs#L191

passing "foo.mps" as filename.

jajhall commented 2 months ago

We also noticed this when running it serial (so request by request) it also gives the same error.

Very odd. Maybe it is worth extracting the model

JeroenMNallian commented 2 months ago

We updated our packages to 1.7.2. I can reproduce the issue locally in our application. The error about corrupted memory is thrown by changeColsIntegralityByRange (see screenshot below) image

We haven't been able to get a model from the code yet because our code stops working before we get a output.

jajhall commented 2 months ago

Your list of integrality values must be as long as the number of columns in the range. You appear to be passing only one, so it will read beyond that

JeroenMNallian commented 2 months ago

Your list of integrality values must be as long as the number of columns in the range. You appear to be passing only one, so it will read beyond that

I Will check this.

Also after some changes in the code I managed to reproduce the following error : Failed to solve node with all integer columns fixed. Declaring node infeasible. This keeps looping until we hit our max allowed time to solve(15 seconds) I have added the 2 files for the same request. error.txt noError.txt The error goes in to a loop. The no error gives a result.