AlbertoSabater / Robust-and-efficient-post-processing-for-video-object-detection

GNU General Public License v3.0
146 stars 20 forks source link

Objects changing trackIDs frequently. With appearance and w/o appearance #14

Closed anuragdeshmukh closed 3 years ago

anuragdeshmukh commented 3 years ago

I am trying to use REPP to get consistent detections and trackIDs at 3-5 fps. I tried with both settings, with appearance and without appearance. For REPP with appearance, I used the YOLOv3 weights, the logistic regression parameters and the appearance feature extractor provided by the author. You can see the output video here(NOTE: this one is 25 fps video): https://youtu.be/0p6enhG8fIA

The numbers on top of each object are TrackID. There are many switches in the IDs for the dogs and the person even when there is a significant overlap b/w previous frame's detection and this frames.

I tried playing around with clf_thr value and reduced it up to 0.1 from its original 0.7 but that still didn't resolve the issues.

Thinking that this might be an issue with YOLOv3's detections (inconsistent class labels?) I tried using REPP w/o appearance with my detector that detects only one class(people).

The Method worked perfectly well when there was only one person in the video. It gave consistent one TrackID for a running person even at 3 FPS. But it started to break with multiple people in the frame. The trackID switched from on person to other even if the other person was across the frame.

Is this expected? What else could be the reasons for failure in each of the settings? Thanks!

AlbertoSabater commented 3 years ago

Hi! Note that REPP has not been designed to be a tracking method. It links detections across frames (tracking) but when some detection is missing it will start again with a new trackID.

In the first case, a quick fix would be to implement a second linking step between tubelets. Then infer the missing detections by interpolation and use the same trackID for all the linked tubelet. In my experiments, it didn't improve the final mAP, but it could help to have a better visualization. Unfortunately, I no longer have that code. You could also apply a second NMS over tubelets with a more restrictive threshold to remove overlapping detections.

In the first case, REPP improves Yolo detections. However, Yolo has been designed to be fast but it is not the most accurate detector. I would suggest using REPP on a more accurate detector or retrain Yolo for a more specific task.

In the second case, you could retrain the REPP classifier in your specific single-person case. Are you obtaining undesired linkings also when working with a higher frame rate (~25 fps)?