Closed adamsardar closed 4 years ago
MIPCL also looks promising,
http://plato.asu.edu/ftp/milp.html
Perhaps I could cook up an Rcpp wrapper at some stage. It's lesser GNU licensed, so why not?
Actually, MIPCL looks really promising
http://plato.asu.edu/talks/euro2019.pdf
I'm fairy surprised that there is no R binding yet
Unfortunately, MIPCL does not actually release the source code - there are some precompiled .so files and that's about it. I suspect that this why we see so little uptake.
MIPCL comes as a precompiled .so file and on windows it requires an explicit installation from a MSI. I can't recommend installation of packages in such an insecure fashion, so will concentrate on CBC.
rcbc looks promising
Bad news! It looks the CBC is considerably slower than GLPK on my laptop.
See below for a benchmark of repeatedly solving repeated instances of the Steiner Forest problem (the solver was attacking the exact same problem each time).
This is very frustrating, but it shows why you need to benchmark.
Even more frustrating! CBC often finds better solutions than GLPK (although at colossal expense!)
solDetailsDT[,.N,by = .(RCBCvcount <= RGLPKvcount)]
(i.e. number of times that the CBC solution is smaller or equal size to GLPK)
RCBCvcount | N |
---|---|
TRUE | 65 |
FALSE | 1 |
solDetailsDT[,.N,by = .(RCBCvcount >= RGLPKvcount)]
(i.e. number of times that the GLPK solution is smaller or equal size to CBC)
RCBCvcount | N |
---|---|
TRUE | 59 |
FALSE | 7 |
There are 7 occassions where CBC has better (smaller) module solutions and 1 where GLPK trumps CBC.
Raw results (only 66 examples ran overnight):
RCBCtime | RCBCvcount | RGLPKtime | RGLPKvcount | nFixedTerminals | trial |
---|---|---|---|---|---|
11.614 | 55 | 2.562 | 55 | 33 | 1 |
63.622 | 60 | 162.976 | 60 | 33 | 2 |
36.094 | 54 | 42.996 | 54 | 33 | 3 |
18.221 | 58 | 12.017 | 58 | 34 | 4 |
45.737 | 56 | 50.730 | 56 | 34 | 5 |
15.391 | 57 | 4.558 | 57 | 32 | 6 |
16.098 | 60 | 3.366 | 60 | 37 | 7 |
22.082 | 58 | 4.040 | 58 | 35 | 8 |
16.979 | 60 | 7.191 | 60 | 37 | 9 |
56.859 | 57 | 3.510 | 57 | 34 | 10 |
24.934 | 55 | 15.572 | 55 | 32 | 11 |
19.679 | 55 | 5.054 | 55 | 37 | 12 |
290.190 | 54 | 19.738 | 54 | 33 | 13 |
52.809 | 49 | 16.664 | 49 | 31 | 14 |
389.285 | 66 | 129.810 | 66 | 40 | 15 |
46.422 | 57 | 11.529 | 57 | 33 | 16 |
78.082 | 57 | 5.076 | 57 | 35 | 17 |
98.783 | 72 | 104.074 | 72 | 43 | 18 |
98.279 | 64 | 5.858 | 64 | 39 | 19 |
126.270 | 74 | 91.341 | 74 | 43 | 20 |
78.224 | 53 | 57.001 | 53 | 31 | 21 |
507.897 | 54 | 115.411 | 54 | 31 | 22 |
66.557 | 58 | 404.906 | 58 | 35 | 23 |
94.955 | 59 | 7.455 | 59 | 37 | 24 |
142.105 | 54 | 60.317 | 54 | 32 | 25 |
68.722 | 44 | 9.241 | 44 | 26 | 26 |
84.844 | 46 | 11.043 | 46 | 28 | 27 |
95.870 | 59 | 380.100 | 59 | 31 | 28 |
319.403 | 60 | 20.774 | 60 | 40 | 29 |
574.199 | 61 | 56.500 | 61 | 40 | 30 |
505.806 | 55 | 87.754 | 55 | 34 | 31 |
183.985 | 56 | 850.606 | 56 | 30 | 32 |
93.381 | 66 | 12.255 | 66 | 36 | 33 |
307.693 | 48 | 164.126 | 48 | 27 | 34 |
141.956 | 57 | 9.900 | 57 | 36 | 35 |
100.401 | 60 | 52.435 | 60 | 34 | 36 |
100.475 | 61 | 12.671 | 61 | 37 | 37 |
101.982 | 59 | 16.258 | 59 | 35 | 38 |
100.246 | 63 | 11.733 | 63 | 40 | 39 |
579.275 | 60 | 513.917 | 60 | 35 | 40 |
367.616 | 50 | 97.376 | 50 | 30 | 41 |
553.308 | 54 | 45.439 | 54 | 32 | 42 |
145.731 | 59 | 16.594 | 59 | 36 | 43 |
585.326 | 55 | 33.491 | 55 | 33 | 44 |
190.230 | 56 | 43.462 | 56 | 33 | 45 |
576.630 | 56 | 24.820 | 53 | 34 | 46 |
132.229 | 62 | 46.506 | 62 | 36 | 47 |
131.538 | 62 | 13.625 | 62 | 38 | 48 |
408.818 | 65 | 108.427 | 65 | 39 | 49 |
136.814 | 64 | 13.863 | 64 | 41 | 50 |
135.638 | 56 | 28.548 | 56 | 34 | 51 |
137.032 | 68 | 10.229 | 68 | 40 | 52 |
281.978 | 62 | 66.125 | 62 | 37 | 53 |
139.237 | 49 | 28.501 | 49 | 30 | 54 |
429.285 | 59 | 10.245 | 59 | 35 | 55 |
290.578 | 50 | 11.296 | 50 | 32 | 56 |
738.847 | 52 | 13.533 | 52 | 32 | 57 |
5119.158 | 48 | 21.417 | 57 | 33 | 58 |
190.331 | 66 | 10.763 | 66 | 41 | 59 |
189.824 | 52 | 12.383 | 52 | 33 | 60 |
6107.653 | 54 | 16.670 | 63 | 38 | 61 |
4883.079 | 41 | 12.458 | 51 | 29 | 62 |
4995.475 | 48 | 86.101 | 57 | 32 | 63 |
4893.707 | 56 | 31.585 | 62 | 40 | 64 |
8644.265 | 44 | 105.102 | 57 | 32 | 65 |
6660.619 | 53 | 15.883 | 64 | 40 | 66 |
library(stoneTrees)
library(igraph)
library(microbenchmark)
fixedTerminalLymphomaGraph <- lymphomaGraph
V(fixedTerminalLymphomaGraph)$isTerminal <- FALSE
V(fixedTerminalLymphomaGraph)[nodeScore > 0]$isTerminal <- TRUE
V(fixedTerminalLymphomaGraph)$nodeScore <- -1
# I notice that the Steiner forest is actually much harder than MWCS to solve - probably the symmetry of having no node-weights
testSteinFor <- nodeCentricSteinerForestProblem$new(fixedTerminalLymphomaGraph, verbose = FALSE, solverChoice = "rcbc")
set.seed(2345)
solDetails <- list()
for(i in 1:100){
print(i)
#Ensure that our solvers are both solving the same type of problem
# We can't just run two bootstrap Stiener forest routines
testSteinFor$.__enclos_env__$private$resampleFixedTerminals()
testSteinFor$.__enclos_env__$private$solver <- "RGLPK"
rglpkTime <- system.time(RGLPKsol <- testSteinFor$findSingleSteinerSolution())
testSteinFor$.__enclos_env__$private$solver <- "RCBC"
rcbcTime <- system.time(RCBCsol <- testSteinFor$findSingleSteinerSolution())
solDetails %<>%
c(list(data.table(RCBCtime = rcpcTime["elapsed"],
RCBCvcount = vcount(RCBCsol),
RGLPKtime = rglpkTime["elapsed"],
RGLPKvcount = vcount(RGLPKsol),
nFixedTerminals = length(V(RCBCsol)[isTerminal]))))
}
rbindlist(solDetails) %>%
.[,trial := .I] %>%
melt(id.vars = "trial") %>%
.[variable %like% "time"] %>%
ggplot(aes(x = variable, y = value/60)) +
geom_boxplot(outlier.shape = NULL) +
geom_jitter(aes(colour = variable)) +
scale_y_log10() +
theme_bw() + theme(legend.position = "none") +
labs(x = "Solver", y = "Time (minues) [log scale]", title = "Comparison of CBC and GLPK solving single iterations ofthe Steiner Forest problem")
I notice that the solver is actually taking longer as more solutions are found. I believe that the previous lazy constraints are kept in the system, but maybe they should be flushed each time the fixed terminals change ...
do testSteinFor <- nodeCentricSteinerForestProblem$new(fixedTerminalLymphomaGraph, verbose = FALSE, solverChoice = "rcbc")
inside the loop just to make sure they are all independent?
Yeah, good idea. I actually think that this might have found a bug though - the lazy constraints are adding significant overhead on the search ...
Why didn't I get a better CPU when I upgraded this laptop!!?
Will benchmark that case afterwards
Bug finding is always a good thing
Calling it a bug is too much - the constraint generation is still correct.
Running CBC on a single thread - there seems to be a cumulative effect here ...
Maybe GLPK is much better at decreasing the constraints than CBC?
Ahem,
I've removed the cumulative connectivity constraints in the steinerForest routine (so when the fixed terminals change, the connectivity constraints are flushed because we are dealing with a new instance of the problem).
The decrease in runtime is astonishing:
First of all, the runtime is no longer slowly increasing.
But also, this is the same code as above - look at the y-axis! 100x speedup!
Also, we don't have that strange behaviour of GLPK being worse than CBC in terms of solution quality.
RCPCtime | RCPCvcount | RGLPKtime | RGLPKvcount | nFixedTerminals | trial |
---|---|---|---|---|---|
0.623 | 55 | 0.902 | 55 | 34 | 1 |
12.725 | 60 | 95.373 | 60 | 33 | 2 |
2.370 | 54 | 0.062 | 54 | 33 | 3 |
0.651 | 58 | 1.713 | 58 | 34 | 4 |
18.794 | 56 | 60.669 | 56 | 34 | 5 |
0.345 | 57 | 1.554 | 57 | 32 | 6 |
0.724 | 60 | 1.243 | 60 | 37 | 7 |
0.380 | 58 | 2.248 | 58 | 34 | 8 |
0.123 | 60 | 0.047 | 60 | 38 | 9 |
1.268 | 57 | 0.339 | 57 | 34 | 10 |
0.334 | 55 | 3.472 | 55 | 32 | 11 |
0.591 | 55 | 0.309 | 55 | 37 | 12 |
1.043 | 54 | 5.194 | 54 | 31 | 13 |
0.533 | 49 | 0.593 | 49 | 31 | 14 |
11.007 | 66 | 27.296 | 66 | 39 | 15 |
1.840 | 57 | 0.179 | 57 | 34 | 16 |
0.709 | 57 | 0.566 | 57 | 35 | 17 |
3.090 | 72 | 97.028 | 72 | 43 | 18 |
0.190 | 64 | 0.353 | 64 | 39 | 19 |
2.941 | 74 | 3.033 | 74 | 43 | 20 |
0.928 | 53 | 1.889 | 53 | 31 | 21 |
7.117 | 54 | 43.801 | 54 | 31 | 22 |
6.274 | 58 | 9.972 | 58 | 36 | 23 |
0.640 | 59 | 0.052 | 59 | 36 | 24 |
2.462 | 54 | 3.642 | 54 | 32 | 25 |
0.533 | 44 | 0.851 | 44 | 26 | 26 |
2.343 | 46 | 0.199 | 46 | 28 | 27 |
3.376 | 59 | 43.497 | 59 | 32 | 28 |
3.612 | 60 | 3.044 | 60 | 40 | 29 |
5.123 | 61 | 33.925 | 61 | 39 | 30 |
1.156 | 55 | 0.542 | 55 | 34 | 31 |
9.863 | 56 | 20.064 | 56 | 30 | 32 |
0.343 | 66 | 0.331 | 66 | 36 | 33 |
9.039 | 48 | 6.709 | 48 | 27 | 34 |
0.706 | 57 | 1.070 | 57 | 36 | 35 |
2.962 | 60 | 0.053 | 60 | 34 | 36 |
0.248 | 61 | 0.289 | 61 | 37 | 37 |
0.400 | 59 | 5.937 | 59 | 35 | 38 |
0.340 | 63 | 1.290 | 63 | 40 | 39 |
30.557 | 60 | 19.091 | 60 | 35 | 40 |
1.669 | 50 | 5.479 | 50 | 30 | 41 |
0.462 | 54 | 6.302 | 54 | 31 | 42 |
0.346 | 59 | 0.680 | 59 | 36 | 43 |
3.794 | 55 | 2.269 | 55 | 34 | 44 |
3.071 | 56 | 14.034 | 56 | 33 | 45 |
13.961 | 53 | 10.606 | 53 | 34 | 46 |
5.347 | 62 | 14.911 | 62 | 36 | 47 |
10.022 | 62 | 23.975 | 62 | 38 | 48 |
7.940 | 65 | 152.686 | 65 | 39 | 49 |
1.338 | 64 | 1.944 | 64 | 41 | 50 |
2.016 | 56 | 0.339 | 56 | 34 | 51 |
0.998 | 68 | 2.199 | 68 | 40 | 52 |
0.364 | 62 | 3.600 | 62 | 37 | 53 |
0.281 | 49 | 0.259 | 49 | 30 | 54 |
3.034 | 59 | 2.430 | 59 | 35 | 55 |
0.280 | 50 | 1.159 | 50 | 32 | 56 |
4.560 | 52 | 1.954 | 52 | 32 | 57 |
0.828 | 57 | 1.727 | 57 | 33 | 58 |
0.311 | 66 | 0.537 | 66 | 42 | 59 |
0.225 | 52 | 0.195 | 52 | 33 | 60 |
10.576 | 63 | 0.853 | 63 | 39 | 61 |
0.636 | 51 | 1.177 | 51 | 30 | 62 |
5.472 | 57 | 9.441 | 57 | 31 | 63 |
0.951 | 62 | 0.353 | 62 | 40 | 64 |
4.103 | 57 | 10.569 | 57 | 32 | 65 |
7.875 | 64 | 1.283 | 64 | 40 | 66 |
3.778 | 56 | 1.616 | 56 | 31 | 67 |
2.757 | 66 | 0.632 | 66 | 40 | 68 |
0.912 | 56 | 0.269 | 56 | 33 | 69 |
2.022 | 58 | 8.052 | 58 | 38 | 70 |
1.376 | 62 | 1.078 | 62 | 37 | 71 |
0.288 | 58 | 0.057 | 58 | 35 | 72 |
1.962 | 52 | 5.357 | 52 | 28 | 73 |
1.825 | 49 | 4.557 | 49 | 30 | 74 |
1.245 | 63 | 0.728 | 63 | 39 | 75 |
1.588 | 43 | 2.859 | 43 | 27 | 76 |
3.051 | 57 | 0.288 | 57 | 36 | 77 |
0.845 | 57 | 1.590 | 57 | 35 | 78 |
1.028 | 56 | 1.901 | 56 | 32 | 79 |
0.680 | 52 | 0.387 | 52 | 30 | 80 |
0.512 | 62 | 3.018 | 62 | 36 | 81 |
1.444 | 62 | 10.187 | 62 | 38 | 82 |
3.711 | 54 | 1.234 | 54 | 34 | 83 |
0.508 | 51 | 0.478 | 51 | 31 | 84 |
1.046 | 60 | 3.888 | 60 | 34 | 85 |
1.603 | 58 | 0.444 | 58 | 34 | 86 |
5.340 | 56 | 208.878 | 56 | 33 | 87 |
2.602 | 62 | 0.330 | 62 | 38 | 88 |
0.695 | 52 | 0.280 | 52 | 34 | 89 |
0.586 | 58 | 1.339 | 58 | 33 | 90 |
0.778 | 49 | 0.687 | 49 | 28 | 91 |
4.499 | 52 | 2.590 | 52 | 33 | 92 |
4.531 | 62 | 10.925 | 62 | 37 | 93 |
0.878 | 57 | 12.693 | 57 | 34 | 94 |
1.452 | 59 | 53.015 | 59 | 33 | 95 |
0.323 | 55 | 0.394 | 55 | 34 | 96 |
0.434 | 49 | 4.607 | 49 | 31 | 97 |
2.391 | 68 | 0.714 | 68 | 40 | 98 |
0.996 | 51 | 3.490 | 51 | 33 | 99 |
0.284 | 54 | 1.820 | 54 | 35 | 100 |
Interesting - so both speed up, but CBC by more. Curious to see what those changes do to the problems that seems intractable previously
It depends on why they were intractable - single Steiner tree problems (or collection of sub-optimal solutions), this will make no difference. But for the Steiner Forest problem the improvement is huge.
Some more benchmarks! Whereas previously I was running the exact same minimum Steiner tree problem, I am not trialling 100 iterations of the full Steiner forest process.
Making repeated calls of the form:
rcbcTime <- system.time(RCBCsol <- nodeCentricSteinerForestProblem$new(fixedTerminalLymphomaGraph, verbose = FALSE, solverChoice = "rcbc")$sampleMultipleBootstrapSteinerSolutions())
(i.e. running the Steiner forest routine using the defaults; 5xbootstraps and 0xiterations for sub-optimal solutions)
CBC seems a little faster (although there isn't that much in it - the speedups have been algorithmic, not solver based).
However ...
The resultant modules from CBC are larger than those from GLPK. There are two reasons why this might happen - GLPK might just find better (smaller) solutions or CBC might find more diverse solutions. I'll need to run another benchmark where I collect the module sizes in the solution pool (which are aggregated to give the result).
The question is now of whether GLPK or CBC find better (smaller) and/or more diverse solutions. This will be answered by my next benchmark.
Last benchmark; the Steiner forest process with 100 bootstraps (i.e. resampling the seed set 100x) and collecting up to five degenerate solutions on each run. I ran the process 15 times (I wanted to finish it overnight and the numbers below seem conclusive). Whilst each of these 15x2=30 runs have different sub-problems (the sampling step is stochastic), comparison of means is still reasonable. This stands in contrast to the first benchmark, where exactly the same Steiner tree problem was solved.
Running something like:
glpktim <- system.time( SteinForGLPK <- nodeCentricSteinerForestProblem$new(fixedTerminalLymphomaGraph, solverChoice = "RGLPK", verbose = FALSE)$sampleMultipleBootstrapSteinerSolutions(nBootstraps = 100, maxItr = 5) )
cbctime <- system.time( SteinForCBC <- nodeCentricSteinerForestProblem$new(fixedTerminalLymphomaGraph, solverChoice = "RCBC", verbose = FALSE)$sampleMultipleBootstrapSteinerSolutions(nBootstraps = 100, maxItr = 5) )
First of all - CBC wins on time:
Over an hour to only a few minutes!!
But as before, the CBC aggregated modules are larger:
Inspecting the individual modules collected (i.e. the solutions to the sub-problem of resampled terminals) the solutions seem equivalent (in size).
The reason for the larger aggregated modules under CBC is simply that the solver is able to find far more solutions to the subproblems:
So there we have it: CBC is much faster and finds many more solutions. This is sufficient in my mind to change the recommended solver.
I always understood that SYMPHONY and CBC were essentially the same thing. I was wrong:
https://list.coin-or.org/pipermail/cbc/2012-January/000755.html
Given that @JulieBorgel has noticed some shortcomings of GLPK, it would be interesting to support CBC and compare performance.