bytedance / ColTrack

This repository is an official implementation of Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking
Apache License 2.0
21 stars 1 forks source link

Details of the pre-training part. #4

Closed lzzppp closed 9 months ago

lzzppp commented 9 months ago

The paper does not talk about the pre-training part. Why is there a pre-training step?

easonbyte commented 9 months ago

We explain the reasons for pre-training in the 'Implementation Details' part of Section 4.1 of the paper. To reduce the GPU memory usage and increase the video clip length during training, we use the detection model trained by Baseline+Bytetrack to initialize the CNN and encoders of Baseline+E2E and ColTrack.

lzzppp commented 9 months ago

Thank you very much for your answer, it solved my problem. I still have a question. It seems that the results of ByteTrack in the comparison algorithm on the valid set are lower than the results reported in the ByteTrack paper.

easonbyte commented 9 months ago

In Fig. 4, the performance of YoloX+bytetrack is obtained using its officially released model weights and code, which may differ from the results of ByteTrack paper. I don’t know if your question refers to this value. If not, please point out its specific location and value, as well as the value in the ByteTrack paper.

easonbyte commented 9 months ago

In addition, it is worth reminding that BL+Bytetrack is not equivalent to YoloX+Bytetrack. For the settings of BL(Baseline), please refer to our paper introduction.

lzzppp commented 9 months ago

Thank you very much for your explanation. I didn't understand why there are two ByteTrack versions. In the results in Table 2, the BL+ByteTrack results are worse than the results reported in the ByteTrack paper. Another question I have is whether the results of ColTrack all use the IPTrack post-processing operation, because the mot17 version of ColTrack I trained on 8*v100 did not reproduce the results of the paper.

easonbyte commented 9 months ago

ByteTrack can be regarded as a post-processing operation for identity matching and is suitable for two-stage MOT methods. You can use any detection model to obtain detection boxes and use ByteTrack for target association. The ByteTrack paper uses yolox to detect targets, but we use a DETR-like model. We use Yolox+Bytetrack and BL+Bytetrack to represent them respectively.

easonbyte commented 9 months ago

In some papers, ByteTrack is considered to include both the detection model and the association algorithm. We believe that these definitions are not the point, as long as everyone understands your experimental settings.

easonbyte commented 9 months ago

In all experiments, IPTrack is not used. You need to carefully adjust hype-parameters (epochs, threshold, etc.) to obtain a better result. We use the same hype-parameter tuning strategy as Bytetrack. More details can be found on the github page of Bytetrack.

lzzppp commented 9 months ago

Thank you for your patient guidance, I have one last question, if I use the e2e_submit_mot17.py configuration and coltrack_inference.py configuration you gave by default, testing at 1 2 3 6 10 15 30 sampling rates. Are the results as shown in the picture normal? Introduction_HOTA

easonbyte commented 9 months ago

Let me introduce our method of drawing Fig. 1 of the paper. We first obtain the HOTA performance of each method at different frame rates. For each HOTA performance, we calculate the processing time at the lowest frame rate required to achieve that performance. The abscissa is the time required by the method when processing a one-second video, which can be obtained by the number of frames to be processed and the time it takes for the method to process one frame.

easonbyte commented 9 months ago

I suggest you draw this figure the way we did in our paper, otherwise I can't make a judgment numerically. In addition, the settings of the Coltrack on the val set of MOT17 are given in e2e_submit_mot17.py. coltrack_inference.py is for the model trained on Dancetrack dataset.

lzzppp commented 9 months ago

Thank you so much! Maybe I need to do some more experiments.