NirAharon / BoT-SORT

BoT-SORT: Robust Associations Multi-Pedestrian Tracking
MIT License
885 stars 422 forks source link

`fuse_score` vs `fuse_motion` #60

Open mikel-brostrom opened 1 year ago

mikel-brostrom commented 1 year ago

Why is it that fuse_motion, i.e.:

raw_emb_dists = matching.embedding_distance(strack_pool, detections)
dists = matching.fuse_motion(self.kalman_filter, raw_emb_dists, strack_pool, detections)
matches, u_track, u_detection = matching.linear_assignment(dists, thresh=self.match_thresh)

Gives much worse results than fuse_score:

ious_dists = matching.iou_distance(strack_pool, detections)
ious_dists_mask = (ious_dists > self.proximity_thresh)

ious_dists = matching.fuse_score(ious_dists, detections)

emb_dists = matching.embedding_distance(strack_pool, detections) / 2.0
raw_emb_dists = emb_dists.copy()
emb_dists[emb_dists > self.appearance_thresh] = 1.0
emb_dists[ious_dists_mask] = 1.0
dists = np.minimum(ious_dists, emb_dists)

Given that both uses motion information (in the form of KF state distribution and IoU gating, respectively)

mikel-brostrom commented 1 year ago

fuse_score results:

HOTA: exp368-pedestrian            HOTA      DetA      AssA      DetRe     DetPr     AssRe     AssPr     LocA      OWTA      HOTA(0)   LocA(0)   HOTALocA(0)
COMBINED                           56.389    54.99     58.369    59.847    76.449    63.768    79.252    82.086    59.019    72.889    77.075    56.18     

CLEAR: exp368-pedestrian           MOTA      MOTP      MODA      CLR_Re    CLR_Pr    MTR       PTR       MLR       sMOTA     CLR_TP    CLR_FN    CLR_FP    IDSW      MT        PT        ML        Frag      
COMBINED                           65.496    79.602    65.771    72.027    92.008    35.95     46.281    17.769    50.804    67501     26215     5863      258       174       224       86        1651      

Identity: exp368-pedestrian        IDF1      IDR       IDP       IDTP      IDFN      IDFP      
COMBINED                           70.634    62.965    80.432    59008     34708     14356 

fuse_motion results:

HOTA: exp368-pedestrian            HOTA      DetA      AssA      DetRe     DetPr     AssRe     AssPr     LocA      OWTA      HOTA(0)   LocA(0)   HOTALocA(0)
COMBINED                           54.251    54.57     54.529    59.663    75.919    60.838    74.501    82.033    56.931    70.248    76.793    53.946    

CLEAR: exp368-pedestrian           MOTA      MOTP      MODA      CLR_Re    CLR_Pr    MTR       PTR       MLR       sMOTA     CLR_TP    CLR_FN    CLR_FP    IDSW      MT        PT        ML        Frag      
COMBINED                           64.507    79.602    65.362    71.975    91.586    36.777    46.694    16.529    49.825    67452     26264     6197      802       178       226       80        2044      

Identity: exp368-pedestrian        IDF1      IDR       IDP       IDTP      IDFN      IDFP      
COMBINED                           67.47     60.247    76.662    56461     37255     17188     

with my custom object detection and ReID models

mikel-brostrom commented 1 year ago

What is the rationale behind this performance drop?