PULP_CBC_CMD solver gets stuck on Windows

sohviluukkonen commented 1 year ago

Details for the issue

What did you do?

I try to run minexample.py script with this data on Windows.

What did you expect to see?

I expect the solver to finish within 10 seconds as I have set timeLimit=10.

What did you see instead?

The solver gets stuck (for at least for minutes) with final stdout being Cbc0010I After 0 nodes, 1 on tree, 0.00018135401 best solution, best possible 3.5152484e-05 (1.29 seconds).

Useful extra information

The script works without problems on Linux.

from pulp import *
import numpy as np
from typing import List

def merge_clusters_with_balancing_mapping(
    tasks_vs_clusters_array : np.array,
    sizes : List[float] = [0.9, 0.1, 0.1], 
    equal_weight_perc_compounds_as_tasks : bool = False,
    relative_gap : float = 0,
    time_limit_seconds : int = 60,
    max_N_threads : int = 1) -> List[List[int]]:
    """
    Linear programming function needed to balance the self.df while merging clusters.

    Paper: Tricarico et al., Construction of balanced, chemically dissimilar training, validation 
    and test sets for machine learning on molecular self.dfsets, 2022, 
    DOI: https://doi.org/10.26434/chemrxiv-2022-m8l33-v2

    Parameters
    ----------
    tasks_vs_clusters_array : 2D np.array
        - the cross-tabulation of the number of self.df points per cluster, per task.
        - columns represent unique clusters.
        - rows represent tasks, except the first row, which represents the number of records (or compounds).
        - Optionally, instead of the number of self.df points, the provided array may contain the *percentages*
            of self.df points _for the task across all clusters_ (i.e. each *row*, NOT column, may sum to 1).
        IMPORTANT: make sure the array has 2 dimensions, even if only balancing the number of self.df records,
            so there is only 1 row. This can be achieved by setting ndmin = 2 in the np.array function.
    sizes : list
        - list of the desired final sizes (will be normalised to fractions internally).
    equal_weight_perc_compounds_as_tasks : bool
        - if True, matching the % records will have the same weight as matching the % self.df of individual tasks.
        - if False, matching the % records will have a weight X times larger than the X tasks.
    relative_gap : float
        - the relative gap between the absolute optimal objective and the current one at which the solver
        stops and returns a solution. Can be very useful for cases where the exact solution requires
        far too long to be found to be of any practical use.
        - set to 0 to obtain the absolute optimal solution (if reached within the time_limit_seconds)
    time_limit_seconds : int
        - the time limit in seconds for the solver (by default set to 1 hour)
        - after this time, whatever solution is available is returned
    max_N_threads : int
        - the maximal number of threads to be used by the solver.
        - it is advisable to set this number as high as allowed by the available resources.

    Returns
    ------
    List (of length equal to the number of columns of tasks_vs_clusters_array) of final cluster identifiers
        (integers, numbered from 1 to len(sizes)), mapping each unique initial cluster to its final cluster.
    Example: if sizes == [20, 10, 70], the output will be a list like [3, 3, 1, 2, 1, 3...], where
        '1' represents the final cluster of relative size 20, '2' the one of relative size 10, and '3' the 
        one of relative size 70.
    """

    # Calculate the fractions from sizes

    fractional_sizes = sizes / np.sum(sizes)

    S = len(sizes)

    # Normalise the self.df matrix
    tasks_vs_clusters_array = tasks_vs_clusters_array / tasks_vs_clusters_array.sum(axis = 1, keepdims = True)

    # Find the number of tasks + compounds (M) and the number of initial clusters (N)
    M, N = tasks_vs_clusters_array.shape
    if (S > N):
        errormessage = 'The requested number of new clusters to make ('+ str(S) + ') cannot be larger than the initial number of clusters (' + str(N) + '). Please review.'
        raise ValueError(errormessage)

    # Given matrix A (M x N) of fraction of self.df per cluster, assign each cluster to one of S final ML subsets,
    # so that the fraction of self.df per ML subset is closest to the corresponding fraction_size.
    # The weights on each ML subset (WML, S x 1) are calculated from fractional_sizes harmonic-mean-like.
    # The weights on each task (WT, M x 1) are calculated as requested by the user.
    # In the end: argmin SUM(ABS((A.X-T).WML).WT)
    # where X is the (N x S) binary solution matrix
    # where T is the (M x S) matrix of task fraction sizes (repeat of fractional_sizes)
    # constraint: assign one cluster to one and only one final ML subset
    # i.e. each row of X must sum to 1

    A = np.copy(tasks_vs_clusters_array)

    # Create WT = obj_weights
    if ((M > 1) & (equal_weight_perc_compounds_as_tasks == False)):
        obj_weights = np.array([M-1] + [1] * (M-1))
    else:
        obj_weights = np.array([1] * M)

    obj_weights = obj_weights / np.sum(obj_weights)

    # Create WML
    sk_harmonic = (1 / fractional_sizes) / np.sum(1 / fractional_sizes)

    # Create the pulp model
    prob = LpProblem("Data_balancing", LpMinimize)

    # Create the pulp variables
    # x_names represent clusters, ML_subsets, and are binary variables
    x_names = ['x_'+str(i) for i in range(N * S)]
    x = [LpVariable(x_names[i], lowBound = 0, upBound = 1, cat = 'Integer') for i in range(N * S)]
    # X_names represent tasks, ML_subsets, and are continuous positive variables
    X_names = ['X_'+str(i) for i in range(M * S)]
    X = [LpVariable(X_names[i], lowBound = 0, cat = 'Continuous') for i in range(M * S)]

    # Add the objective to the model

    obj = []
    coeff = []
    for m in range(S):
        for t in range(M):
            obj.append(X[m*M+t])
            coeff.append(sk_harmonic[m] * obj_weights[t])

    prob += LpAffineExpression([(obj[i],coeff[i]) for i in range(len(obj)) ])

    # Add the constraints to the model

    # Constraints forcing each cluster to be in one and only one ML_subset
    for c in range(N):
        prob += LpAffineExpression([(x[c+m*N],+1) for m in range(S)]) == 1

    # Constraints related to the ABS values handling, part 1 and 2
    for m in range(S):
        for t in range(M):
            cs = [c for c in range(N) if A[t,c] != 0]
            prob += LpAffineExpression([(x[c+m*N],A[t,c]) for c in cs]) - X[t] <= fractional_sizes[m]
            prob += LpAffineExpression([(x[c+m*N],A[t,c]) for c in cs]) + X[t] >= fractional_sizes[m]

    # Solve the model
    prob.solve(PULP_CBC_CMD(gapRel = relative_gap, timeLimit = time_limit_seconds, threads = max_N_threads, msg=True))

    print("Status:", LpStatus[prob.status])

if __name__ == '__main__':

    tasks_vs_clusters_array = np.load('tasks_vs_clusters_array.npy')
    merge_clusters_with_balancing_mapping(tasks_vs_clusters_array,time_limit_seconds = 10, max_N_threads = 1)

What operating system are you using?

[x] Windows: ( _version: 10 )

I'm using python version:

[x] Other: 3.10

I installed PuLP via:

[x] pypi (python -m pip install pulp)

Did you also

[x] Tried out the latest github version: https://github.com/coin-or/pulp
[x] Searched for an existing similar issue: https://github.com/coin-or/pulp/issues?utf8=%E2%9C%93&q=is%3Aissue%20

pchtsp commented 11 months ago

Hello! The two links are private so we cannot access them. The timeLimit in CBC kicks in after a pre-processing step. It may be that the solver takes too much time in the pre-processing.

From the one-line log you showed it seems you're using a high number of precision floats (3.5152484e-05, 0.00018135401). I recommend you round all numbers to a significant number of relevant digits (i.e., 3.142 instead of 3.14159265358979). This sometimes has a dramatic impact in time and performance in all solvers, including CBC.

Finally, consider using a commercial solver (GUROBI, CPLEX) and see if you see the same behavior.

jgutierrezre commented 10 months ago

I am facing the same issue with a custom problem. Below is the log.


Version: 2.10.3
Build Date: Dec 15 2019

command line - cbc.exe 5132a2e48c924a18a651b5012afe81ce-pulp.mps sec 400 threads 16 timeMode elapsed branch printingOptions all solution 5132a2e48c924a18a651b5012afe81ce-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 20860 COLUMNS
At line 131612 RHS
At line 152468 BOUNDS
At line 162171 ENDATA
Problem MODEL has 20855 rows, 9702 columns and 91102 elements
Coin0008I MODEL read with 0 errors
seconds was changed from 1e+100 to 400
threads was changed from 0 to 16
Option for timeMode changed from cpu to elapsed
Continuous objective value is 295.791 - 6.32 seconds
Cgl0002I 1575 variables fixed
Cgl0003I 260 fixed, 201 tightened bounds, 8125 strengthened rows, 9488 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 443 strengthened rows, 1086 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 6952 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 6316 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 4721 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 2993 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 1634 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 512 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 112 strengthened rows, 0 substitutions
Cgl0003I 0 fixed, 0 tightened bounds, 15 strengthened rows, 0 substitutions
Cgl0004I processed model has 11801 rows, 3978 columns (3978 integer (3777 of which binary)) and 81622 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 1779 integers unsatisfied sum - 417.456
Cbc0038I Pass   1: (12.88 seconds) suminf.  205.51872 (1232) obj. 353.416 iterations 2244
Cbc0038I Pass   2: (12.95 seconds) suminf.  169.61710 (998) obj. 376.348 iterations 362
Cbc0038I Pass   3: (12.99 seconds) suminf.  147.90778 (918) obj. 373.199 iterations 212
Cbc0038I Pass   4: (13.01 seconds) suminf.  132.07522 (886) obj. 379.413 iterations 101
Cbc0038I Pass   5: (13.04 seconds) suminf.  130.52551 (872) obj. 380.211 iterations 69
Cbc0038I Pass   6: (13.06 seconds) suminf.  129.85884 (871) obj. 380.545 iterations 29
Cbc0038I Pass   7: (13.07 seconds) suminf.  126.67302 (868) obj. 380.695 iterations 20
Cbc0038I Pass   8: (13.09 seconds) suminf.  126.33969 (866) obj. 380.695 iterations 22
Cbc0038I Pass   9: (13.11 seconds) suminf.  126.33969 (866) obj. 380.695 iterations 9
Cbc0038I Pass  10: (13.15 seconds) suminf.  121.89120 (815) obj. 381.594 iterations 178
Cbc0038I Pass  11: (13.16 seconds) suminf.  120.18138 (811) obj. 382.23 iterations 56
Cbc0038I Pass  12: (13.19 seconds) suminf.  105.02740 (798) obj. 384.137 iterations 81
Cbc0038I Pass  13: (13.21 seconds) suminf.  104.30309 (728) obj. 384.499 iterations 45
Cbc0038I Pass  14: (13.23 seconds) suminf.  100.08087 (716) obj. 383.999 iterations 31
Cbc0038I Pass  15: (13.24 seconds) suminf.  100.08087 (716) obj. 383.999 iterations 9
Cbc0038I Pass  16: (13.28 seconds) suminf.   94.10105 (677) obj. 386.709 iterations 154
Cbc0038I Pass  17: (13.30 seconds) suminf.   93.18186 (675) obj. 387.164 iterations 88
Cbc0038I Pass  18: (13.33 seconds) suminf.   92.81259 (713) obj. 388.16 iterations 131
Cbc0038I Pass  19: (13.34 seconds) suminf.   92.78168 (673) obj. 387.958 iterations 28
Cbc0038I Pass  20: (13.38 seconds) suminf.   87.15282 (604) obj. 389.7 iterations 204
Cbc0038I Pass  21: (13.41 seconds) suminf.   77.32782 (592) obj. 390.775 iterations 120
Cbc0038I Pass  22: (13.43 seconds) suminf.   78.06814 (588) obj. 391.116 iterations 51
Cbc0038I Pass  23: (13.45 seconds) suminf.   78.77782 (548) obj. 392.325 iterations 28
Cbc0038I Pass  24: (13.46 seconds) suminf.   76.27782 (543) obj. 392.825 iterations 21
Cbc0038I Pass  25: (13.48 seconds) suminf.   76.27782 (543) obj. 392.825 iterations 4
Cbc0038I Pass  26: (13.50 seconds) suminf.   73.76654 (535) obj. 393.314 iterations 37
Cbc0038I Pass  27: (13.51 seconds) suminf.   73.76654 (535) obj. 393.314 iterations 12
Cbc0038I Pass  28: (13.53 seconds) suminf.   73.76654 (535) obj. 393.314 iterations 16
Cbc0038I Pass  29: (13.55 seconds) suminf.   72.16654 (533) obj. 393.714 iterations 15
Cbc0038I Pass  30: (13.56 seconds) suminf.   72.16654 (533) obj. 393.714 iterations 25
Cbc0038I No solution found this major pass
Cbc0038I Before mini branch and bound, 1711 integers at bound fixed and 6 continuous
Cbc0038I Full problem 11801 rows 3978 columns, reduced to 5217 rows 1177 columns
Cbc0038I Mini branch and bound did not improve solution (13.73 seconds)
Cbc0038I After 13.74 seconds - Feasibility pump exiting - took 1.45 seconds
Cbc0031I 63 added rows had average density of 57.809524
Cbc0013I At root node, 63 cuts changed objective from 296.92593 to 301.40256 in 14 passes
Cbc0014I Cut generator 0 (Probing) - 140 row cuts average 2.8 elements, 0 column cuts (0 active)  in 0.439 seconds - new frequency is 1
Cbc0014I Cut generator 1 (Gomory) - 238 row cuts average 245.6 elements, 0 column cuts (0 active)  in 2.557 seconds - new frequency is 1
Cbc0014I Cut generator 2 (Knapsack) - 39 row cuts average 4.3 elements, 0 column cuts (0 active)  in 0.222 seconds - new frequency is 1
Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.062 seconds - new frequency is -100
Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 1 row cuts average 182.0 elements, 0 column cuts (0 active)  in 0.153 seconds - new frequency is -100
Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.018 seconds - new frequency is -100
Cbc0014I Cut generator 6 (TwoMirCuts) - 1333 row cuts average 84.1 elements, 0 column cuts (0 active)  in 1.390 seconds - new frequency is 1
Cbc0014I Cut generator 7 (ZeroHalf) - 222 row cuts average 9.8 elements, 0 column cuts (0 active)  in 3.778 seconds - new frequency is 1
Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible 301.40256 (31.68 seconds)```

robertHowlett commented 3 months ago

I fixed this in my case by downloading the latest cbc (2.10.11) and manually overwriting the binary installed via pulp.

coin-or / pulp