Sometimes the detector categories the same object differently under different views. If we use the hard constraint on the object category for re-ID, we will fail to re-identify the objects that have multiple categories.
We mainly Grid-Search the ensemble weights in sim(i, j) and the threshold of sim(i, j) in Algorithm 3. The parameters in CLIP(i, j) and DINOv2(i, j) are used to transform them into the same distribution.
We have tried some recent MoT & Re-ID work, but the performances are not very satisfied. The robust multi-object tracking and Re-ID methods on long-form video are still awaits exploration. If you have some good MoT & Re-ID methods, feel free to recommend them here!
对您的工作很感兴趣,非常感谢您的伟大工作,现有几个问题向您请教,希望得到您的回复!