chengche6230 / ReST

[ICCV 2023] ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking
MIT License
130 stars 15 forks source link

Suggestions for bad temporal inference and illogical spatial assignments within the scene #16

Closed deepsworld closed 1 month ago

deepsworld commented 2 months ago

Thank you for sharing your great work. We are trying to use the ReST code and framework on MMPTrack dataset which has fully overlapping camera views and larger training data size than Wildtrack. The training and inference code runs fine but the results are not good. We did not modify the code other than the dataset to allow it to work with this custom dataset. Can you please provide some suggestions or advice on the following:

  1. Frame wise tracking in single camera view results in tracks that change almost every frame.
  2. Spatial assignments across cameras somewhat works but assigns same track ID to multiple person in the same camera view.

Please find 5 annotated frame sequence (frame #21-25, each image contains the grid of all views) from the MMPTrack pretrained model for your reference. We use the same config as Wildtrack and the model predictions and training is with groundtruth boxes

grid_21 grid_22 grid_23 grid_24 grid_25

Thanks, Deep

chengche6230 commented 2 months ago

Hi

Thanks for your interest in our research. Our model is mostly rely on the feature of geometry position as claimed in the paper. You need to make sure the projection of the same person is close enough while the projection of different people is far away by refining the homography matrix instead of using homography from other datasets. It may not work when the region is too small and crowded.

Best