Hi Alex! Thank you for your excellent work again! Here I want to ask about a question in Cascade Matching of DeepSORT. It may be more related to the native algorithm.
In Cascade Matching, it includes two parts: Mahalanobis Distance and Cosine Distance. We use Cosine Distance to measure the appearance similarity between the detection result and the tracker's prediction result. So I think both the detection and track bboxes should be input into the extractor, and then these two are the inputs of the Cosine Matching to make comparision.
But in the code, only the detection bboxes is input into the extractor, as the embeddings shown in line 159 in mot.py : self.tracker.update(self.frame_count, detections, embeddings)
and the Cosine Matching part is shown in line 330 in tracker.py:
cost = cdist(features, embeddings, self.metric, empty_mask, fill_val)
Here embeddings is the result of detection from extractor, while features is the result of track(prediction), and it looks like features doesn't come from the extractor(not be got from the Re-ID model). Can these two kind of "different" things be compared or is it correct?
Hope that you can get what I mean! Maybe it's a silly question, thanks in advance~
Hi Alex! Thank you for your excellent work again! Here I want to ask about a question in Cascade Matching of DeepSORT. It may be more related to the native algorithm.
In Cascade Matching, it includes two parts: Mahalanobis Distance and Cosine Distance. We use Cosine Distance to measure the appearance similarity between the detection result and the tracker's prediction result. So I think both the detection and track bboxes should be input into the extractor, and then these two are the inputs of the Cosine Matching to make comparision. But in the code, only the detection bboxes is input into the extractor, as the
embeddings
shown in line 159 inmot.py
:self.tracker.update(self.frame_count, detections, embeddings)
and the Cosine Matching part is shown in line 330 intracker.py
:cost = cdist(features, embeddings, self.metric, empty_mask, fill_val)
Hereembeddings
is the result of detection from extractor, whilefeatures
is the result of track(prediction), and it looks likefeatures
doesn't come from the extractor(not be got from the Re-ID model). Can these two kind of "different" things be compared or is it correct?Hope that you can get what I mean! Maybe it's a silly question, thanks in advance~