Rectangular problems not supported by lapmod()?

emanuele commented 5 years ago

I tried to exectue lapmod() on a simple rectangular problem (10x20), but got a segmentation fault. Differently, lapjv supports rectangular problems. Here the code to reproduce, with sparse_from_dense() taken from your tests:

## test_rlap.py ##

import numpy as np
from lap import lapjv, lapmod

def sparse_from_dense(cost):
    cc = cost.flatten()
    n_rows = cost.shape[0]
    n_columns = cost.shape[1]
    ii = np.empty((n_rows+1,), dtype=int)
    ii[0] = 0
    ii[1:] = n_columns
    ii = np.cumsum(ii)
    kk = np.tile(np.arange(n_columns, dtype=int), n_rows)
    return n_rows, cc, ii, kk

if __name__ == '__main__':
    n_rows = 10
    n_cols = 20
    min_value = 0
    max_value = 1000
    cost = np.random.randint(low=min_value, high=max_value, size=(n_rows, n_cols))
    cost_opt, row_col, col_row = lapjv(cost, extend_cost=True)
    print("lapjv: %s" % cost_opt)
    n_rows, cc, ii, kk = sparse_from_dense(cost)
    cost_opt, col_row, row_col = lapmod(n_rows, cc, ii, kk)
    print("lapmod: %s" % cost_opt)

## executed:
In [1]: run test_rlap.py
lapjv: 444.0
*** Error in `/usr/bin/python': malloc(): memory corruption: 0x0000000002f907d0 ***
...

Are there plans to add support for rectangular problems in lapmod()? Am I missing something?

Thanks for your efforts and code!

emanuele commented 5 years ago

Conversely, if the rectangular cost matrix has more rows than columns, e.g. n_rows=20, n_cols=10, then lapmod() does not segfault but the result is puzzling: the optimized final cost is "infinite" and the assignment is wrong and does not match at all with that of lapjv() - which correctly works in this case too.

gatagat commented 5 years ago

Yes, you are completely right lapjv can deal with non-square matrices but lapmod cannot.

This has been already brought up once, see https://github.com/gatagat/lap/issues/4#issuecomment-314847414

gatagat commented 5 years ago

What would be ideal is to make lapmod(..., extend_cost=True) analogous to lapjv(..., extend_cost=True) as that is what everybody expects.

I'd be happy to include a PR that adds this.

emanuele commented 5 years ago

First of all, thank you for the prompt answer and the code. I tried to use the code you referenced, but it does not seem to work because the results of lapjv and lapmod differ:

import numpy as np
from lap import lapjv, lapmod
from lapmod import prepare_sparse_cost
from scipy.sparse import coo_matrix, csr_matrix

def lapmod_rectangular(cost_coo, cost_limit=1e3):
    cc = cost_coo.data
    ii = cost_coo.row
    jj = cost_coo.col
    shape = cost_coo.shape
    n_rows = shape[0]
    cc, ii, kk = prepare_sparse_cost(shape, cc, ii, jj, cost_limit)
    ind1, ind0 = lapmod(len(ii)-1, cc, ii, kk, return_cost=False)
    ind1[ind1 >= shape[1]] = -1
    ind0[ind0 >= shape[0]] = -1
    ind1 = ind1[:shape[0]]
    ind0 = ind0[:shape[1]]
    return ind1, ind0

if __name__ == '__main__':
    np.random.seed(0)
    n_rows = 5
    n_cols = 10
    min_value = 0
    max_value = 10
    cost = np.random.randint(low=min_value, high=max_value, size=(n_rows, n_cols))
    print("cost matrix:")
    print(cost)
    cost_opt, x, y = lapjv(cost, extend_cost=True, return_cost=True)
    print("lapjv cost_opt: %s" % cost_opt)
    print("lapjv x: %s" % (x,))
    print("lapjv y: %s" % (y,))
    cost_coo = coo_matrix(cost)
    cost_limit = 1e3
    ind1, ind0 = lapmod_rectangular(cost_coo, cost_limit)
    print("lapmod ind1: %s" % (ind1,))
    print("lapmod ind0: %s" % (ind0,))
    print("lapmod cost_opt: %s" % cost[range(n_rows), ind1].sum())

### execution ###
cost matrix:
[[5 0 3 3 7 9 3 5 2 4]
 [7 6 8 8 1 6 7 7 8 1]
 [5 9 8 9 4 3 0 3 5 0]
 [2 3 8 1 3 3 3 7 0 1]
 [9 9 0 4 7 3 2 7 2 0]]
lapjv cost_opt: 1.0
lapjv x: [1 4 9 8 2]
lapjv y: [-1  0  4 -1  1 -1 -1 -1  3  2]
lapmod ind1: [8 9 7 3 6]
lapmod ind0: [-1 -1 -1  3 -1 -1  4  2  0  1]
lapmod cost_opt: 9

Am I doing something wrong?

I'm targeting (very) large (very) sparse (slightly) rectangular cost matrices, around 1Mx1M. I'm also having a look at the auction algorithm to understand whether it could address my problems or not. Would you suggest it in such a setting? Unfortunately, I cannot find a nice Python implementation of it, like your lapmod().

gatagat commented 5 years ago

Thanks for the test case, that helps a lot.

From a quick look it really seems there is some issue with the way the matrix gets extended in prepare_sparse_cost. I'll try to find out whats happening and let you know.

Concerning the auction algorithm, i am not aware of a nice python implementation.

gatagat commented 5 years ago

Ok, the way cost is extended in lapjv and the proposed extension for lapmod above differ from each other, which could in principle cause some issues and i am looking into it, but the issue is elsewhere.

Your usage of scipy.sparse.coo_matrix assumes that the zero values in your cost matrix are the ones that can be left out, which is wrong. All entries "missing" from the sparse cost matrix are assumed to be infinite (not zero). So in the test case this effectively excludes all originally zero entries from the lapmod solution.

If you modify the original cost as follows, you will get the cost that lapmod is using. For this modified cost matrix also lapjv will output an optimal solution with optimum of 9.

cost[cost == 0] = cost_limit

Does this make sense? Any ideas on how to clarify this in the docs?

emanuele commented 5 years ago

Yes! Sorry for the rookie mistake! Indeed avoiding zeros in the cost matrix solves the problem, even just by setting min_value = 1:

cost matrix:
[[6 1 4 4 8 4 6 3 5 8]
 [7 9 9 2 7 8 8 9 2 6]
 [9 5 4 1 4 6 1 3 4 9]
 [2 4 4 4 8 1 2 1 5 8]
 [4 3 8 3 1 1 5 6 6 7]]
lapjv cost_opt: 6.0
lapjv x: [1 8 6 7 4]
lapjv y: [-1  0 -1 -1  4 -1  2  3  1 -1]
lapmod ind1: [1 8 6 5 4]
lapmod ind0: [-1  0 -1 -1  4  3  2 -1  1 -1]
lapmod cost_opt: 6

(the two solutions of lapjv and lapmod are slighty different but equivalent in terms of cost).

About how to describe this issue in the docs: clearly, coo_matrix(cost) is not the way to get cc, ii, jj in a straighforward way, because of the handling of zeros as special values, i.e. as missing data that lapmod() will consider instead as infinite value, as you pointed out. Maybe adding WARNING: scipy.sparse.coo_matrix(cost) will not return the correct (cc, ii, jj) if cost has some zero values. in the docstring of prepare_sparse_cost()?. Alterntively, scipy.sparse.coo_matrix() should trigger a warning every time it is instanced from a dense matrix with some zero values. After all, those values will disappear and such behavior is not even documented in the documentation(!).

gatagat commented 5 years ago

Thanks for the input, I've updated the docs of prepare_sparse_cost.

gatagat / lap

Rectangular problems not supported by lapmod()? #14