Closed zy1296 closed 3 years ago
Hi~ The idea of ''a learned feature is a set of parameters'' follows two recent papers, DETR and Deformable DETR. Simply speaking, the decoder of Transformer takes input this set of learned feature(NxC), interacts them with the image feature map, then decode them as bounding box(Nx4) and classification score(NxK), where N is the number of objects, K is the number of categories.
Thanks! Wish you all good!
@zy1296 @PeizeSun
Thanks for sharing the code!
In your paper your state that The learned object query detects objects in the current frame. The object feature query from the previous frame associates objects of the current frame with the previous ones.
Could you please point out to me in the code, where exactly this happens? I understand that for a single key, there are two decoders unlike DETR where there is just one. I'm still a little confused and would like if you can point out the part where this happens.
the learned object query detects objects: https://github.com/PeizeSun/TransTrack/blob/main/models/deformable_detrtrack_test.py#L168
the object feature query from the previous frame: https://github.com/PeizeSun/TransTrack/blob/main/models/deformable_detrtrack_test.py#L200
Hi Peize,
Thanks for the wonderful work of TransTrack!
I went through your paper and was still a little confused about the learned feature used for object detection. As you said in the paper, a learned feature is a set of parameters. So, what are the parameters? Could you talk about it in more details?
Thanks in advance for your help!