dstl / Stone-Soup

A software project to provide the target tracking community with a framework for the development and testing of tracking algorithms.
https://stonesoup.rtfd.io
MIT License
400 stars 131 forks source link

Change 2d assignment implementation to SciPy version #717

Closed sdhiscocks closed 1 year ago

sdhiscocks commented 1 year ago

SciPy changed from Hungarian to the same algorithm that was provided to us from David Crouse, based on modified Jonker-Volgenant algorithm. It's preferable to use SciPy C implementation, than our Python NumPy implementation.

codecov[bot] commented 1 year ago

Codecov Report

Base: 94.58% // Head: 94.66% // Increases project coverage by +0.08% :tada:

Coverage data is based on head (38449c4) compared to base (501294b). Patch coverage: 75.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #717 +/- ## ========================================== + Coverage 94.58% 94.66% +0.08% ========================================== Files 172 171 -1 Lines 8841 8773 -68 Branches 1721 1707 -14 ========================================== - Hits 8362 8305 -57 + Misses 346 342 -4 + Partials 133 126 -7 ``` | Flag | Coverage Δ | | |---|---|---| | integration | `70.03% <75.00%> (-0.10%)` | :arrow_down: | | unittests | `90.75% <75.00%> (+0.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/dstl/Stone-Soup/pull/717?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl) | Coverage Δ | | |---|---|---| | [stonesoup/dataassociator/neighbour.py](https://codecov.io/gh/dstl/Stone-Soup/pull/717/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl#diff-c3RvbmVzb3VwL2RhdGFhc3NvY2lhdG9yL25laWdoYm91ci5weQ==) | `94.11% <75.00%> (+0.07%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dstl)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sdhiscocks commented 1 year ago

Should have added, that I tested this on a large cost matrix and got identical results with both implementations.

gawebb-dstl commented 1 year ago

It’s worth noting that the implementations are slightly different. This difference doesn’t affect the GNNWith2DAssignment class but may affect other functions/classes which use the algorithm For non-square cost arrays the numpy implementation explicitly uses ‘-1’ to signify that a non-match while the scipy implementation ignores indices that do not match. They are implicitly absence. The other difference is the row_ind (row4col in the numpy implementation) output from the scipy implementation is sorted while the numpy implementation isn’t.

The following script highlights the difference

from stonesoup.dataassociator._assignment import assign2D
from scipy.optimize import linear_sum_assignment
import numpy as np
from numpy.random import default_rng

def numpy_version(C, maximise=True):
    C2 = np.copy(C)
    gain, col4row, row4col = assign2D(C2, maximise)

    if gain.size <= 0:
        return None, None
    else:
        return col4row, row4col

def scipy_version(C, maximise=True):
    C2 = np.copy(C)
    try:
        row4col, col4row = linear_sum_assignment(C2, maximise)
    except ValueError:
        return None, None
    else:
        return col4row, row4col

def compare_versions(C, print_stuff=False):
    n_c4r, n_r4c = numpy_version(C)
    s_c4r, s_r4c = scipy_version(C)

    c4r_match = np.array_equal(n_c4r, s_c4r)
    r4c_match = np.array_equal(n_r4c, s_r4c)

    if print_stuff:
        if c4r_match and r4c_match:
            print("Everything matches")
        else:
            if not c4r_match:
                print("Col4row:", "Numpy answer", n_c4r, "doesn't equal Scipy answer", s_c4r, "for cost matrix size", C.shape)
            if not r4c_match:
                print("Row4col:", "Numpy answer", n_r4c, "doesn't equal Scipy answer", s_r4c, "for cost matrix size", C.shape)
            #print("For Cost Matrix", C)

    return c4r_match and r4c_match

cost_matrix = np.array([[5, 2], [4,11], [16, 3]])
numpy_col4row, numpy_row4col = numpy_version(cost_matrix)
scipy_col_ind, scipy_row_ind = scipy_version(cost_matrix)

print("Cost Matrix:", cost_matrix)
print("Numpy result:", numpy_col4row, numpy_row4col)
print("Scipy result:", scipy_col_ind, scipy_row_ind)
print("Numpy col4row", numpy_col4row, "explicitly ‘-1’ to signify that there is no match in the first row")
# You could get the cost via either of these two methods for the numpy implementation
costs = [cost_matrix[i, j] for i, j in enumerate(numpy_col4row) if j != -1]
costs = [cost_matrix[i, j] for j, i in enumerate(numpy_row4col) if i != -1]

# The scipy version makes it more convenient to get the cost values
costs = cost_matrix[scipy_row_ind, scipy_col_ind]

print()
print("Other random examples")

rng = default_rng()
rints = rng.integers(low=2, high=10, size=10)

print("Square arrays")
for size_of_array in rints:
    C = np.random.rand(size_of_array, size_of_array)
    do_they_match = compare_versions(C, True)

print("Non-Square arrays")
rints2 = rng.integers(low=2, high=10, size=(10,2))
for (i,j) in rints2:
    C = np.random.rand(i, j)
    do_they_match = compare_versions(C, True)