Closed kennyvoo closed 4 months ago
Hi, Thank you for your questions.
The main goal of validation processes has always been to maximize the HOTA. MOTA is heavily reliant on the quality of detection and is not a fair measure for assessing tracking accuracy. The reported results represent the average of the outcomes observed during the validation stages.
In Table IV, the results after post-processing are reported because otherwise, there would be few items left in the table for comparison.
Based on my research, YOLOX outperforms YOLOv8 for high-accuracy tracking, as evidenced by the findings presented in Table III and Figure 8 of the paper. However, if you require a very fast tracker, particularly for tracking fast-moving objects like a golf ball, I recommend utilizing YOLOv8.
Thank you for the prompt reply! This is an excellence work! Even without postprocessing, it got very good result after hyperparameter tuning. Can you please advise if there's any mistake or any more improvement that I can made from the following result?
Following the guide
LTH and MTH2 are hyperparameters influenced
by the behavior of the object detector
both HTH and MTH1 should be decreased
to achieve an adaptive tracker
Therefore, NTH should be increased to
prevent identity switching.
I've changed the following parameters trying to improve the HOTA (Just trying out a few different value) | Param | original | New |
---|---|---|---|
high_th | 0.82 | 0.7 | |
match_th_first | 0.5 | 0.6 | |
match_th_second | 0.1 | 0.4 | |
low_th | 0.3 | 0.2 | |
new_track_th | 0.7 | 0.5 |
If the objective is to increase HOTA and IDF1, reducing NTH seem to bring quite a significant boost to HOTA and IDF1 but at the expense of having higher ID switches. So, does it make more sense to keep the IDs lower but with lower HOTA?
Tracker | HOTA | MOTA | IDF1 | IDs |
---|---|---|---|---|
Bytetrack | 60.498 | 65.208 | 69.603 | 552 |
SFSORT (default) | 57.134 | 55.619 | 65.488 | 463 |
SFSORT (new + NTH=0.7) | 58.36 | 58.498 | 66.992 | 391 |
SFSORT (new + NTH=0.6) | 61.644 | 66.384 | 70.69 | 575 |
SFSORT (new + NTH=0.5) | 61.963 | 68.427 | 70.784 | 714 |
Another additional question is
From my experiment, just measuring the predict function for both SFSORT and Bytetrack v1, the differences are only around ~x4 only.
Thank you for the information you shared. The default configuration of SFSORT is aligned with YOLOX. As you mentioned, it's advisable to adjust the hyperparameters when employing a new object detector.
During my studies, I've discovered that the choice of object detector greatly influences tracking accuracy. While reaching a HOTA above 90% on the MOT17 dataset might seem challenging due to its specific assumptions, I've observed HOTA values nearing 95% on videos sourced from other datasets, thanks to meticulous fine-tuning of the object detector. To attain higher accuracies on MOT17 and MOT20, a method that proves time-consuming yet highly effective involves fine-tuning cutting-edge object detectors, like YOLO9, using diverse human image datasets. ByteTrack employed this strategy with YOLOX, which greatly contributed to its success.
The preference between IDs or HOTA depends on the application. In offline tracking scenarios where ID correction can be facilitated through post-processing, higher HOTA is often preferred. Conversely, in tracking crowded scenes, prioritizing higher IDs may be preferred.
When calculating the tracking speed, I measured it from the moment the detections were delivered to the tracker until the IDs were received from the tracker, following the advice provided on the MOTChallenge website. Considering that background noise, such as that from the OS, server, IDE, etc., can impact measurement accuracy, the reported speed in the paper is the average speed obtained after several repetitions of the experiment. To measure the tracking speed, I utilized the "time" and "timedelta" packages in Python.
Hello, thank you for the great work. I hope to clarify a few questions.
If I directly use the provided yolov8n and evaluate on MOT17 train set, the MOTA difference between SFSORT and Bytetrack is 10. Is this normal? Or I missed out any configuration or steps. I'll also check if I made any mistake somewhere. bytetrack_trackeval.txt SFSORT_trackeval.txt
I'm using the default configuration and only update the framerate and the frame size for each video.