Closed ben-eysenbach closed 6 years ago
@gatagat - fyi, there is also related discussion in src-d/lapjv#3.
@ben-eysenbach I think you are not interpreting the results correctly. The returned values are different because of the different representation of assignments.
SciPy's row, col = linear_sum_assignment(cost)
should be interpreted as row[i] <-> col[i].
lap's x, y = lapjv(cost)
should be interpreted as j <-> x[j] (or y[i] <-> i)
If you have an idea how to make this clearer in the docs, let me know. Now it reads:
x: vector of columns assigned to rows
y: vector of rows assigned to columns
@gatagat - Thanks for the clarification!
Here's one alternative way this could be explained in the function docstring:
Args:
C (array): an (N x M) array specifying assignment costs.
Entry C[i, j] is the cost of assigning row i to column j.
Outputs:
x (array): a size-N array specifying to which column is row is assigned.
y (array): a size-M array specifying to which row each column is assigned.
Here's some discussion that might be helpful in the documentation/readme:
The function lapjv(C)
returns two arrays, x, y
. If cost matrix C
has shape N x M, then x
is a size-N array specifying to which column is row is assigned, and y
is a size-M array specifying to which row each column is assigned. For example, an output of x = [1, 0]
indicates that row 0 is assigned to column 1 and row 1 is assigned to column 0. Similarly, an output of x = [2, 1, 0]
indicates that row 0 is assigned to column 2, row 1 is assigned to column 1, and row 2 is assigned to column 0.
Note that this function does not return the assignment matrix (as done by scipy's linear_sum_assignment
and lapsolver's solve dense
). The assignment matrix can be constructed from x
as follows:
A = np.zeros((N, M))
for i in range(N):
A[i, x[i]] = 1
Equivalently, we could construct the assignment matrix from y
:
A = np.zeros((N, M))
for j in range(M):
A[y[j], j] = 1
Finally, note that the outputs are redundant: we can construct x
from y
, and vise versa:
x = [np.where(y == i)[0][0] for i in range(N)]
y = [np.where(x == j)[0][0] for j in range(M)]
@ben-eysenbach Cool, can you put this into a PR?
Sure!
Thanks!
It appears that this library sometimes returns incorrect results. The example below shows two failure cases. For both problem instances, this library incorrectly assigns both rows to the same column. For reference, I have also included the (presumably correct) results returned by the scipy implementation.
Result
Please let me know if I'm using the library incorrectly, or if there are certain types of inputs to avoid.