ben-eysenbach commented 6 years ago

It appears that this library sometimes returns incorrect results. The example below shows two failure cases. For both problem instances, this library incorrectly assigns both rows to the same column. For reference, I have also included the (presumably correct) results returned by the scipy implementation.

import scipy.optimize
import lap
import numpy as np

C1 = np.array([[8., 9.],
               [1., 4.]])
C2 = np.array([[5., 1.],
               [8., 9.]])

for C in [C1, C2]:
    print('\n%s' % C)
    print('LAPJV result: %s, %s' % lap.lapjv(C)[1:])
    print('Scipy result: %s, %s' % scipy.optimize.linear_sum_assignment(C))

Result

[[8. 9.]
 [1. 4.]]
LAPJV result: [1 0], [1 0]
Scipy result: [0 1], [1 0]

[[5. 1.]
 [8. 9.]]
LAPJV result: [1 0], [1 0]
Scipy result: [0 1], [1 0]

Please let me know if I'm using the library incorrectly, or if there are certain types of inputs to avoid.

dribnet commented 6 years ago

@gatagat - fyi, there is also related discussion in src-d/lapjv#3.

gatagat commented 6 years ago

@ben-eysenbach I think you are not interpreting the results correctly. The returned values are different because of the different representation of assignments.

SciPy's row, col = linear_sum_assignment(cost) should be interpreted as row[i] <-> col[i].

lap's x, y = lapjv(cost) should be interpreted as j <-> x[j] (or y[i] <-> i)

If you have an idea how to make this clearer in the docs, let me know. Now it reads:

  x: vector of columns assigned to rows
  y: vector of rows assigned to columns

ben-eysenbach commented 6 years ago

@gatagat - Thanks for the clarification!

Here's one alternative way this could be explained in the function docstring:

Args:
    C (array): an (N x M) array specifying assignment costs.
        Entry C[i, j] is the cost of assigning row i to column j.
Outputs:
    x (array): a size-N array specifying to which column is row is assigned.
    y (array): a size-M array specifying to which row each column is assigned.

Here's some discussion that might be helpful in the documentation/readme:

The function lapjv(C) returns two arrays, x, y. If cost matrix C has shape N x M, then x is a size-N array specifying to which column is row is assigned, and y is a size-M array specifying to which row each column is assigned. For example, an output of x = [1, 0] indicates that row 0 is assigned to column 1 and row 1 is assigned to column 0. Similarly, an output of x = [2, 1, 0] indicates that row 0 is assigned to column 2, row 1 is assigned to column 1, and row 2 is assigned to column 0.

Note that this function does not return the assignment matrix (as done by scipy's linear_sum_assignment and lapsolver's solve dense). The assignment matrix can be constructed from x as follows:

A = np.zeros((N, M))
for i in range(N):
    A[i, x[i]] = 1

Equivalently, we could construct the assignment matrix from y:

A = np.zeros((N, M))
for j in range(M):
    A[y[j], j] = 1

Finally, note that the outputs are redundant: we can construct x from y, and vise versa:

x = [np.where(y == i)[0][0] for i in range(N)]
y = [np.where(x == j)[0][0] for j in range(M)]

gatagat commented 6 years ago

@ben-eysenbach Cool, can you put this into a PR?

ben-eysenbach commented 6 years ago

Sure!

11

gatagat commented 6 years ago

Thanks!

gatagat / lap

Incorrect results? #10

11