Question about learned feature query

PeizeSun / TransTrack

Multiple Object Tracking with Transformer

MIT License

627 stars 109 forks source link

Question about learned feature query #12

Closed zy1296 closed 3 years ago

zy1296 commented 3 years ago

Hi Peize,

Thanks for the wonderful work of TransTrack!

I went through your paper and was still a little confused about the learned feature used for object detection. As you said in the paper, a learned feature is a set of parameters. So, what are the parameters? Could you talk about it in more details?

Thanks in advance for your help!

PeizeSun commented 3 years ago

Hi~ The idea of ''a learned feature is a set of parameters'' follows two recent papers, DETR and Deformable DETR. Simply speaking, the decoder of Transformer takes input this set of learned feature(NxC), interacts them with the image feature map, then decode them as bounding box(Nx4) and classification score(NxK), where N is the number of objects, K is the number of categories.

zy1296 commented 3 years ago

Thanks! Wish you all good!

sramakrishnan247 commented 3 years ago

@zy1296 @PeizeSun Thanks for sharing the code! In your paper your state that The learned object query detects objects in the current frame. The object feature query from the previous frame associates objects of the current frame with the previous ones.

Could you please point out to me in the code, where exactly this happens? I understand that for a single key, there are two decoders unlike DETR where there is just one. I'm still a little confused and would like if you can point out the part where this happens.

PeizeSun commented 3 years ago

the learned object query detects objects： https://github.com/PeizeSun/TransTrack/blob/main/models/deformable_detrtrack_test.py#L168

the object feature query from the previous frame: https://github.com/PeizeSun/TransTrack/blob/main/models/deformable_detrtrack_test.py#L200