ERGO-Code / HiGHS

Linear optimization software
MIT License
929 stars 173 forks source link

Crash depending on order of ROWS in MPS File #963

Open chriswasser opened 1 year ago

chriswasser commented 1 year ago

Hi everyone,

While playing around with HiGHS through the interface of the PuLP package, I noticed some unexpected behaviour when HiGHS receives an MPS input file. Since PuLP does not enforce any specific ordering on the constraints in the ROWS section (or any other section for that matter) of the generated MPS file, different MPS files are produced for the same instance. In this issue, I have attached two instances of a MIP scheduling problem (.txt files need to be renamed back to .mps), which work as minimal reproducers for the observed issue:

They are both feasible instances (as confirmed by Gurobi and SCIP). The only difference between these files: Two consecutive lines in the ROWS section are swapped

$ diff --unified correct-result.mps.txt incorrect-result.mps.txt
--- correct-result.mps.txt  2022-09-28 00:13:52.000000000 +0200
+++ incorrect-result.mps.txt    2022-09-27 23:25:34.000000000 +0200
@@ -2025,8 +2025,8 @@
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node1CPU1Socket1_Node2CPU1Socket2
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node1CPU1Socket2_Node1CPU1Socket1
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node1CPU1Socket2_Node2CPU1Socket1
- L  DataTransferForDependencyConstraint_6AomkypFfA_XKN9FnoZHH_Node2CPU1Socket2_Node1CPU1Socket2
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node1CPU1Socket2_Node2CPU1Socket2
+ L  DataTransferForDependencyConstraint_6AomkypFfA_XKN9FnoZHH_Node2CPU1Socket2_Node1CPU1Socket2
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node2CPU1Socket1_Node1CPU1Socket1
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node2CPU1Socket1_Node1CPU1Socket2
  L  DataTransferForDependencyConstraint_XKN9FnoZHH_tMQYOOXFMk_Node2CPU1Socket1_Node2CPU1Socket2

HiGHS is able to correctly solve the first instance correct-result.mps while it crashes with a segmentation fault for the second instance incorrect-result.mps. The full output for both executions with HiGHS version 1.2.2 is given below:

# downloaded from https://github.com/ERGO-Code/HiGHS/archive/refs/tags/v1.2.2.tar.gz
# --> just as a side note: the output says HiGHS 1.2.1 because the released CMakeLists.txt contains `set(HIGHS_VERSION_PATCH 1)`
$ highs correct-result.mps                
Running HiGHS 1.2.1 [date: 2022-08-27, git hash: n/a]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Number of BV entries in BOUNDS section is 508
MIP  correct-result has 2454 rows; 541 cols; 9140 nonzeros; 508 integer variables
Presolving model
2450 rows, 541 cols, 9080 nonzeros
2290 rows, 461 cols, 8664 nonzeros

Solving MIP model with:
   2290 rows
   461 cols (444 binary, 0 integer, 0 implied int., 17 continuous)
   8664 nonzeros

( 0.6s) Starting symmetry detection
( 0.7s) Found 4 generators

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
     Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0         0     0.7s
         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0        51     0.8s
 L       0       0         0   0.00%   0.0295229476    0.0295344351       0.04%     4409     38      0       357     6.1s
 L       0       0         0   0.00%   0.0295229476    0.0295267625       0.01%     4409     16      0      1501     7.2s
 B      15       0         1   0.01%   0.0295229476    0.0295229903       0.00%     4409     16      2      1997     7.5s

Solving report
  Status            Optimal
  Primal bound      0.0295229903309
  Dual bound        0.0295229903309
  Gap               0% (tolerance: 0.01%)
  Solution status   feasible
                    0.0295229903309 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            7.47 (total)
                    0.55 (presolve)
                    0.00 (postsolve)
  Nodes             19
  LP iterations     2012 (total)
                    3 (strong br.)
                    306 (separation)
                    1353 (heuristics)
$ highs incorrect-result.mps 
Running HiGHS 1.2.1 [date: 2022-08-27, git hash: n/a]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Number of BV entries in BOUNDS section is 508
MIP  incorrect-result has 2454 rows; 541 cols; 9140 nonzeros; 508 integer variables
Presolving model
2450 rows, 541 cols, 9080 nonzeros
2290 rows, 461 cols, 8664 nonzeros

Solving MIP model with:
   2290 rows
   461 cols (444 binary, 0 integer, 0 implied int., 17 continuous)
   8664 nonzeros

( 0.6s) Starting symmetry detection
( 0.7s) Found 4 generators

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
     Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0         0     0.7s
         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0        51     0.8s
highs: src/lp_data/HighsSolve.cpp:67: HighsStatus solveLp(HighsLpSolverObject&, std::string): Assertion `solver_object.solution_.value_valid' failed.
zsh: abort (core dumped)  highs incorrect-result.mps

As far as I could tell from the online documentation of the MPS file format here, the order of constraints in the ROWS section should not matter:

The NAME card can have anything you want, starting in column 15.  The
ROWS section defines the names of all the constraints; entries in
column 2 or 3 are E for equality rows, L for less-than ( <= ) rows, G
for greater-than ( >= ) rows, and N for non-constraining rows (the
first of which would be interpreted as the objective function).  The
order of the rows named in this section is unimportant.

However, as there does not seem to be a formal specification of the MPS file format, I am unsure whether HiGHS follows this specification as well. Is this behaviour a bug in HiGHS or actually expected? If the constraints in the ROWS section need to be ordered, in which way do they need to be arranged such that tools like PuLP can respect this?

I am happy to provide whatever information is missing. Thanks for any feedback on this issue! I look forward hearing from you 🤓

Greetings

Christian

odow commented 1 year ago

Nice detective skills! I bet this was a painful example to find.

However, as there does not seem to be a formal specification of the MPS file format, I am unsure whether HiGHS follows this specification as well.

The lack of MPS specifications causes all sorts of issues 😢

Is this behaviour a bug in HiGHS or actually expected?

It's a bug in HiGHS

If the constraints in the ROWS section need to be ordered, in which way do they need to be arranged such that tools like PuLP can respect this?

They don't need to be ordered

chriswasser commented 1 year ago

Thanks for the quick confirmation 👍

Nice detective skills! I bet this was a painful example to find.

You couldn't be more right but I guess your detective work is also just starting. Good luck finding the bug 🍀🐛

jajhall commented 1 year ago

As @odow says, the order of the rows shouldn't matter but, in general, it leads to solvers behaving differently. If the problem has a non-unique solution, interchanging the rows can lead to the same solver getting different optimal solutions.

I've reproduced the bug that you expose in v1.2.2, and will investigate it next week. Note that HiGHS master doesn't show the same behaviour, but the MIP solver has been modified recently.

Thanks

chriswasser commented 1 year ago

@jajhall You are correct! The reproducers from above both succeed with the current HiGHS master. Overall the number of errors I see greatly decreased. However, I have two more version of the same instance:

Again, only constraints in the ROWS section are reordered:

$ diff --unified correct-result.mps incorrect-result.mps
--- correct-result.mps  2022-09-28 13:08:53.048554942 +0200
+++ incorrect-result.mps        2022-09-28 13:03:41.258087838 +0200
@@ -18,20 +18,20 @@
  E  ComputeCompletionTimeConstraint_ueIOJ2gMjs
  E  ComputeCompletionTimeConstraint_hpGPSGljDO
  E  ComputeCompletionTimeConstraint_2WuMzKPxKJ
+ L  OrderJobsForFixedPrecedencesConstraint_tMQYOOXFMk_Z7IsibK9ce
+ L  OrderJobsForFixedPrecedencesConstraint_AP8tLyAnxt_oj8kVdk68o
+ L  OrderJobsForFixedPrecedencesConstraint_9G63kLGQll_oPNNsnuGgk
+ L  OrderJobsForFixedPrecedencesConstraint_XKN9FnoZHH_tMQYOOXFMk
+ L  OrderJobsForFixedPrecedencesConstraint_3DtQqCz3c9_hpGPSGljDO
+ L  OrderJobsForFixedPrecedencesConstraint_8kufsenGfq_Z7IsibK9ce
+ L  OrderJobsForFixedPrecedencesConstraint_ueIOJ2gMjs_hpGPSGljDO
+ L  OrderJobsForFixedPrecedencesConstraint_gZKLkx7Y7u_XKN9FnoZHH
  L  OrderJobsForFixedPrecedencesConstraint_zFhrLyUcyv_tMQYOOXFMk
  L  OrderJobsForFixedPrecedencesConstraint_oj8kVdk68o_XKN9FnoZHH
- L  OrderJobsForFixedPrecedencesConstraint_Z7IsibK9ce_ueIOJ2gMjs
- L  OrderJobsForFixedPrecedencesConstraint_ueIOJ2gMjs_hpGPSGljDO
+ L  OrderJobsForFixedPrecedencesConstraint_6AomkypFfA_XKN9FnoZHH
  L  OrderJobsForFixedPrecedencesConstraint_oPNNsnuGgk_XKN9FnoZHH
- L  OrderJobsForFixedPrecedencesConstraint_9G63kLGQll_oPNNsnuGgk
- L  OrderJobsForFixedPrecedencesConstraint_gZKLkx7Y7u_XKN9FnoZHH
- L  OrderJobsForFixedPrecedencesConstraint_tMQYOOXFMk_Z7IsibK9ce
- L  OrderJobsForFixedPrecedencesConstraint_XKN9FnoZHH_tMQYOOXFMk
+ L  OrderJobsForFixedPrecedencesConstraint_Z7IsibK9ce_ueIOJ2gMjs
  L  OrderJobsForFixedPrecedencesConstraint_6FM5SHtllE_ueIOJ2gMjs
- L  OrderJobsForFixedPrecedencesConstraint_AP8tLyAnxt_oj8kVdk68o
- L  OrderJobsForFixedPrecedencesConstraint_6AomkypFfA_XKN9FnoZHH
- L  OrderJobsForFixedPrecedencesConstraint_8kufsenGfq_Z7IsibK9ce
- L  OrderJobsForFixedPrecedencesConstraint_3DtQqCz3c9_hpGPSGljDO
  L  RespectMakespanConstraint_9G63kLGQll
  L  RespectMakespanConstraint_oPNNsnuGgk
  L  RespectMakespanConstraint_AP8tLyAnxt

As before, HiGHS is able to correctly solve the first instance correct-result.mps while it crashes with a segmentation fault (although a different one) for the second instance incorrect-result.mps. The full output for both executions with the HiGHS master is given below:

$ highs correct-result.mps 
Running HiGHS 1.2.2 [date: 2022-09-28, git hash: 9c7b2cf2]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Number of BV entries in BOUNDS section is 508
MIP  correct-result has 2454 rows; 541 cols; 9140 nonzeros; 508 integer variables
Presolving model
2450 rows, 541 cols, 9080 nonzeros
2290 rows, 461 cols, 8664 nonzeros

Solving MIP model with:
   2290 rows
   461 cols (444 binary, 0 integer, 0 implied int., 17 continuous)
   8664 nonzeros

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
     Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0         0     0.6s
         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0        51     0.7s
 L       0       0         0   0.00%   0.0295229476    0.0295325951       0.03%     1687     35      0       212     3.3s

0.2% inactive integer columns, restarting
Model after restart has 2289 rows, 460 cols (443 bin., 0 int., 0 impl., 17 cont.), and 8666 nonzeros

         0       0         0   0.00%   0.0295229476    0.0295325951       0.03%       13      0      0       641     3.6s
         0       0         0   0.00%   0.0295229476    0.0295325951       0.03%       13      5      0       652     3.7s
 L       0       0         0   0.00%   0.0295229476    0.0295305774       0.03%     1786     33      0       818     5.5s

Symmetry detection completed in 0.1s
Found 1 generators

 L       0       0         0   0.00%   0.0295229476    0.0295305347       0.03%     1786     14      0      1168     6.3s
 B       0       0         0   0.00%   0.0295229476    0.029526848        0.01%     1786     14      1      1244     6.6s
 T      53       7         7  12.56%   0.0295229476    0.0295267197       0.01%     1822     17      6      1968     7.0s
 T      79       0        16  50.11%   0.0295229476    0.0295229476       0.00%     1823     17     14      2057     7.3s

Solving report
  Status            Optimal
  Primal bound      0.0295229475758
  Dual bound        0.0295229475758
  Gap               0% (tolerance: 0.01%)
  Solution status   feasible
                    0.0295229475758 (objective)
                    0 (bound viol.)
                    0 (int. viol.)
                    0 (row viol.)
  Timing            7.30 (total)
                    0.66 (presolve)
                    0.00 (postsolve)
  Nodes             81
  LP iterations     2059 (total)
                    28 (strong br.)
                    366 (separation)
                    829 (heuristics)
$ highs incorrect-result.mps 
Running HiGHS 1.2.2 [date: 2022-09-28, git hash: 9c7b2cf2]
Copyright (c) 2022 ERGO-Code under MIT licence terms
Number of BV entries in BOUNDS section is 508
MIP  incorrect-result has 2454 rows; 541 cols; 9140 nonzeros; 508 integer variables
Presolving model
2450 rows, 541 cols, 9080 nonzeros
2290 rows, 461 cols, 8664 nonzeros

Solving MIP model with:
   2290 rows
   461 cols (444 binary, 0 integer, 0 implied int., 17 continuous)
   8664 nonzeros

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
     Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0         0     0.6s
         0       0         0   0.00%   0.0295229048    inf                  inf        0      0      0        51     0.7s
 L       0       0         0   0.00%   0.0295229476    0.02953641         0.05%     1687     35      0       212     3.2s

0.2% inactive integer columns, restarting
Model after restart has 2289 rows, 460 cols (443 bin., 0 int., 0 impl., 17 cont.), and 8666 nonzeros

         0       0         0   0.00%   0.0295229476    0.02953641         0.05%       13      0      0       405     3.5s
         0       0         0   0.00%   0.0295229476    0.02953641         0.05%       13      5      0       416     3.6s
 L       0       0         0   0.00%   0.0295229476    0.0295325951       0.03%     1793     33      0       582     6.0s

Symmetry detection completed in 0.1s
Found 1 generators

highs: src/mip/HighsDomain.h:483: void HighsDomain::fixCol(HighsInt, double, HighsDomain::Reason): Assertion `infeasible_ == 0' failed.
zsh: abort (core dumped)  highs incorrect-result.mps

Hopefully these new examples can help you to pinpoint the issue next week.

Greetings

jajhall commented 1 year ago

Thanks, I'd rather fix master directly than have to learn from v1.2.2. However, this looks like a different issue

jajhall commented 1 year ago

Assertion `infeasible_ == 0' failing is not an error, merely an indication that infeasibility could have been detected earlier, so just a minor inefficiency. In release mode, MIP should solve fine. Nonetheless, I'll try to make the most of this example

chriswasser commented 1 year ago

You are completely right, it solves fine to optimality in Release mode!

I previously ran HiGHS only with RelWithDebug mode enabled. I did not expect such a different output (or even a core dump in this case) depending on the build type. Maybe this behaviour will be unexpected for other users as well?!

Do you want me to close this issue or keep it open as a reference point for you?

jajhall commented 1 year ago

It's an assert, so "identifiable" termination, rather than something like a segfault and, by default, HiGHS builds in release. If folk build in RelWithDebug then they know that this can happen. ;-)

I spoke to our developer last week, and know how to go about making the code more efficient so that the assert isn't triggered. However, I won't be doing this soon, so let's leave this as a reference point.

chriswasser commented 1 year ago

I agree that knowing where the application crashed is better than a general segfault without any hint. However, since there is no comment (in the output or within the code itself) why this assertion failed, any user is still without actionable information. I guess it just depends on whether you see debug builds (or rather non-Release builds which disable assertions with -DNDEBUG) as something a user should be able to run or not.

Thanks for providing such prompt feedback. Cheers 👍

jajhall commented 1 year ago

Fair point: I'll add a comment to the code for now to indicate that this is an efficiency warning rather than an error, and I guess it's only fair to warn users who compile with RelWithDebug or debug that asserts may be triggered.

We're still learning from users, so there are thanks to you, too.