Distance-based matching in metric.py is broken

favyen2 commented 2 years ago

I don't think this code in metric.py computes precision and recall scores in a reasonable way:

dist_mat = distance_matrix(pred_array, gt_array, p=2) * PIX_TO_M
rows, cols = linear_sum_assignment(dist_mat)
tp_inds = [
    {"pred_idx": preds.index[rows[ii]], "gt_idx": gt.index[cols[ii]]}
    for ii in range(len(rows))
    if dist_mat[rows[ii], cols[ii]] < distance_tolerance
]

Consider this example:

Ground truth: (100, 100); (200, 200); (300, 300)
Prediction: (101, 101); (201, 201); (50, 50)

What do you think the precision and recall should be?

I think they should be 2/3 each. But metric.py thinks precision and recall should both be zero, because that minimizes the cost of the bipartite matching.

EXPECTED MATCHING:
(101, 101)<->(100, 100)
(201, 201)<->(200, 200)
(50, 50)<->(300, 300)
Cost is sqrt(2)*(1 + 1 + 250) = sqrt(2)*252.

"BEST" MATCHING
(50, 50)<->(100, 100)
(101, 101)<->(200, 200)
(201, 201)<->(300, 300)
Cost is sqrt(2)*(50 + 99 + 99) = sqrt(2)*248. This is lower!

I think this clearly shows the current distance-based matching in metric.py is broken and needs to be fixed.

I don't quite know the best way to fix it, but one solution is adding an extra line:

dist_mat = distance_matrix(pred_array, gt_array, p=2) * PIX_TO_M

# New line -- dramatically increase the cost for selecting infeasible edges.
# Set this to > scene width/height.
dist_mat[dist_mat >= distance_threshold] = 99999

rows, cols = linear_sum_assignment(dist_mat)
...

You might think this is a small issue, but after making this change, I observe increases in the F1 score up to 5 percentage points (0.05 out of 1.0) with the same CSV file!

Here is the test case so you can easily reproduce it (shows 0 true positives):

import numpy
import scipy.optimize
import scipy.spatial
pred_array = numpy.array([[101, 101], [201, 201], [50, 50]], dtype='float32')
gt_array = numpy.array([[100, 100], [200, 200], [300, 300]], dtype='float32')
dist_mat = scipy.spatial.distance_matrix(pred_array, gt_array, p=2)
rows, cols = scipy.optimize.linear_sum_assignment(dist_mat)
print(len([i for i in range(len(rows)) if dist_mat[rows[i], cols[i]] < 20]))

favyen2 commented 2 years ago

Actually, I think the best way is like this:

dist_mat = distance_matrix(pred_array, gt_array, p=2) * PIX_TO_M
bin_mat = (dist_mat > distance_threshold).astype('int32')
rows, cols = linear_sum_assignment(bin_mat)
tp_inds = [
    {"pred_idx": preds.index[rows[ii]], "gt_idx": gt.index[cols[ii]]}
    for ii in range(len(rows))
    if bin_mat[rows[ii], cols[ii]] == 0
]

We want to maximize the number of matches under the distance threshold, so we should operate only on a binary matrix. There's no reason to perform the bipartite matching over the distance matrix, if our objective is to compute best precision/recall given the distance threshold constraint.

jdunnmon commented 2 years ago

@favyen2 thanks for the detailed issue, appreciate the effort that went in here. A couple of concrete thoughts:

(1) The reason we don't use a binary cost matrix is that all of the downstream classification tasks rely on specific pairings of predictions with ground truth, and so we want to make sure the ground truth detection closest to a given prediction is paired with that prediction as much as possible (while also discouraging egregious false positives). The binary cost matrix makes it difficult to do this because exact distance information is lost.

(2) The idea of putting a very large cost on any pairs with distance over the tolerance is an interesting one, and could make sense depending on exactly what incentives one is aiming to set with the metric. As we briefly discussed offline, we'll look into the relative effects of the two implementations and report back.

Thanks again!

timatcambrio commented 2 years ago

Thank you again, @favyen2! We are closing this since we have implemented your suggestion in #13.

DIUx-xView / xview3-reference

Distance-based matching in metric.py is broken #9