Your tracker doesn't perform well in the wild, please add the ability to drop embeddings

ghost commented 3 years ago

Hi. Your embeddings seem to match with the wrong object a lot when tried out in random outdoor scenarios and I see no other choice but to drop the embeddings if there is no match for a track in the next frame. However, doing that with your current codebase doesn't seem so straightforward. Can you please add this ability to this codebase? Thanks

siyuanliii commented 3 years ago

It's interesting to see the models in the real world. Could you provide more details of your experiments? For example, how did you train our model? Which dataset did you use? Also, it would be great if you could provide us some visualization results so that we could understand better the issues. :)

ghost commented 3 years ago

Hello. I actually went ahead and implemented a system on top of your codebase to reject tracks if they are not seen in the next N frames after the last frame in which they were seen. Hence, I guess I resolved the issue (would still be good if you do It in your codebase though).

As for the detailed specifics, we ran your pretrained model (the one trained on the TAO dataset) on a video we recorded inside our computer lab. The lab has a lot of similar-looking computer monitors (even the background was quite similar because we have plain-looking walls inside) that sometimes get confused by the tracker as being the same (two similar-looking computer monitors are assigned the same track when in reality they are at a certain distance from each other).

One way we decided to deal with this problem is just discarding a track if it doesn't appear in the next frame. We then have our own algorithm for matching tracks that could potentially belong to the same object.

I am closing this issue since I managed to deal to solve this problem but feel free to write a follow-up comment. For anyone, looking as to how to implement such a system themselves, you can assign integer indices to each image frame. Store the sequence of images for a track as a sequence of integers and obtain continuous intervals and assign each interval a new track id.

For example, given track 4 which is seen in images identified by integers: 0, 1, 2, 3, 4, 5, 6, 20, 21, 22, 23, 24, 25 (So, 0 = image_0.jpeg, 1 = image_1.jpeg and so on), you can find the two intervals: [0, 1, 2, 3, 4, 5, 6], [20, 21, 22, 23, 24, 25] and assign a new track id to each interval. So here, if the maximum track id was 337 (from the original QDTracks), you can assign a new track id of 338 to [0, 1, 2, 3, 4, 5, 6] and a new track id of 339 to [20, 21, 22, 23, 24, 25]. If you want to do this for an arbitrary N frames, just tweak the approach to check if the last id of one interval and the starting id of the next interval is not less than N (otherwise the two intervals can be joined together and this can continue for the next intervals as well)

SysCV / qdtrack

Your tracker doesn't perform well in the wild, please add the ability to drop embeddings #85