Difference between 'tid' and 'tracked_ids', 'bbox' and 'tracked_bbox'.

brjathu / PHALP

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)

Other

282 stars 43 forks source link

Difference between 'tid' and 'tracked_ids', 'bbox' and 'tracked_bbox'. #12

Closed zhixuany closed 1 year ago

zhixuany commented 1 year ago

Thanks for the fantastic work! While I am playing around with the code, I am a little bit confused about the difference in meaning between two pairs of keys, i.e. 'tid' vs 'tracked_ids', and 'bbox' v.s. 'trackedbbox'. It seems that they are equivalent if "tracks.time_since_update==0" is true. Could you provide some explanations here? Thanks!!

brjathu commented 1 year ago

Hi, thanks for your interest in our work and apologies for the confusion. Yes you are correct. if "tracked_time" is 0, then "tid", "bbox" are from the current detection, else these belongs to the last seen detection of the track. "tid", "bbox" both are always updated and you could use "tracked_time"==0, to filter the true detection. On the other hand, 'tracked_ids','tracked_bbox' are updated only if there is a fresh new detection to be appended to the track. https://github.com/brjathu/PHALP/blob/901698d1cce76f06f103f4ac0d891f17a245c6ba/demo_online.py#L106

Hope this explains the keys, please feel free to ask any question for more explanation.

zhixuany commented 1 year ago

Thank you so much for the prompt reply. So basically:

['tracking_ids', 'tracked_bbox'] only keeps tracklets with matched detections of current frame, while ['tid', 'bbox'] additionally include unmatched tracklets. The former is always a subset of the later. The ages of all tracklets are stored in 'tracked_time'.
If a tracklet do have have matched detection for a few continuous frames, then a copy of its data (e.g. 'bbox', 'smpl', 'camera', 'center') in the last matched frame is duplicated as if the person just stay still for a period, until he/she get matched again or reaches the max_age.

Is this understanding correct?

brjathu commented 1 year ago

you meant "If a tracklet do not have have matched detection"? if so you are correct. These copies are used to visualize "ghosts" where there location and pose will be updates with our prediction model, for example at occlusion. You can visualize them by setting GHOST_FULL_FAST in the render type.

zhixuany commented 1 year ago

Yes, sorry that is a typo.

Got it! Thank you very much!