MCG-NJU / MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
https://arxiv.org/abs/2307.15700
MIT License
140 stars 8 forks source link

Abount bdd100k #8

Closed BruceYu-Bit closed 7 months ago

BruceYu-Bit commented 8 months ago

Congradulations about the achievement. But i wonder when will release the bdd100k model and train methods?

HELLORPG commented 8 months ago

Hi, thank you for your interest in our work.

As the BDD100 evaluation server has some issues, we can't double-check our model and training scripts on BDD100K. This hinders our open-source progress because the code of the BDD100K part needs to be merged into this repo.

I'll release the code and scripts of the BDD100K dataset this weekend (as soon as possible). However, I can't rigorously check the model's outputs. If you have any problem, feel free to ask me.

HELLORPG commented 8 months ago

Hi, I have now uploaded all the code, configuration, and checkpoint file for BDD100K. Please note that I'm currently unable to double-check the absolute correctness of all these. Although the calculation code for TETA is provided in this repo, I'm still having some trouble:

  1. There are some issues with running this code directly, even using the output files we already submitted to the evaluation server.
  2. After I fixed the code, the evaluation results were a little different (about 0.4 TETA) from the results from the eval.ai.

Moreover, our experiment on BDD100K was rushed (to be precise, only one chance because it costs about one week, which is SO LONG), so I did not tune the training strategy very well (primarily refer to MOTR). Therefore, there may be some known instability factors. Most likely, it is about the difference in training and inference image size. The max image size during training is 800x1333 as we want to save the training memory and time. However, the max image size during inference is 800x1536. In our experiments, using 800x1536 for inference could achieve better tracking performance, but it may also cause convergence instability because there is a gap in position encoding between training and inference. Based on experience, I think the model can achieve better tracking performance if the max image size 1536 (like we did for DanceTrack/SportsMOT) is also adopted during the training stage. But I do not have enough spare GPUs for such a long training now.

In short, if you encounter any problems in the BDD100 experiments, please feel free to contact me and discuss them together.

HELLORPG commented 7 months ago

As I haven't received your reply for a long time, I am closing this issue temporarily. Feel free to re-open this issue if you need~