Open Mahesha999 opened 3 months ago
Hi! I really appreciate your detailed observations. I’m facing the exact same issue with BoT-SORT using YOLOX and the re-id model (mot17_sbs_S50.pth). I’ve also tried adjusting parameters like track_buffer, proximity_threshold, appearance_threshold, and match_threshold without success. Have you found any effective solution to the re-id problem beyond the threshold workaround?
I was trying to use BoT-SORT with reid on a simple video in which single person is walking on the road, first gets occluded by small tree and then by a billboard. Also this is a drone footage though not at very high altitude and
I am using yolox for detection model with weights from
bytetrack_x_mot17.pth.tar
and reid model (which ismot17_sbs_S50.pth
). This is what paper and code base also uses by default.BoT-SORT was able to correctly able to recognise same person when he emerges out of tree. However, when he emerges out of bill board, he gets new ID assigned. I tried by increasing
track_buffer
,proximity_threshold
,appearance_threshold
as well asmatch_threshold
, but no luck.So, I tried to debug the code. Here are my observations: For long occlusions (like billboard) iou similarity inside
matching.iou_distance()
method is[0]
(single zero for single person detection). This makesious_dists = [1]
(line 6 in below code excerpt from official BotSORT repo). For long occlusions, appearance similarity is also turns out to be[0]
, makingemb_dists = [1]
(line 13). This makes overalldists = [1]
. Now thisdists
is passed to matching function on line 30. Since I setmatch_thresh
to 0.6 which was less than 1, it did not match / associate any existing tracklet with the detection bounding box corresponding to person arising out of bill board, thus assigning new ID to the person.So I increased
match_thresh
to 1.1 and it started working. However this is just the hack, since the thresholds are meant range between 0 to 1 and setting it anything bigger than 1.1 effectively means: if alldists
have values1
s, match existing tracks with anything that appears in the scene. If a new person appears in the scene before the occluded person comes out of billboard, that new person gets assigned with older persons ID !I observed same when there are multiple people in the scene occluded by some object. If a new person comes in the scene before any of occluded person gets un-occluded, that person gets assigned occluded person's ID.
I have following questions:
Q1. Why
ious_dists = [1]
? Because there is no overlaps between bounding boxes of before and after occlusions?Q2. Why
emb_dists = [1]
? Because, the reid model is able to generate similar features for same person before and after occlusion?Q3. If answer to Q2 is Yes, then do I need use reid model fine tuned on my dataset? Just for reference, BoTSORT paper says this:
while fast-reid paper says this:
Q4. If answer to Q3, is Yes, then is there any approach / model that allows us to do re-identification without fine tuning re-id model?
Q5. I am also doubt ful about above part of code. If
ious_dists
is all1
s (line 6),ious_dists_mask
will become allTrue
(line 7), which will makeemb_dists
all 1s on line 16, makingdists
all 1s on line 17. My understanding was that we should be using appearance similarity for long occlusions, but here zero IoU similarity is nullifying appearance similarity for long occlusions. Isnt it wrong? Or am missing something?