facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.08k stars 2.37k forks source link

What is the object query ? #588

Open hitbuyi opened 1 year ago

hitbuyi commented 1 year ago

It hard to under this concept? I have some questions on it 1) How to obtain object query? is it from the image? who is responsible to design it? 2) What does it look like? is it a vector? 3) During training, how to use object query? What is relationship between GT and object query?

Shar-01 commented 1 year ago

Object queries are the inputs to the decoder layer. They are randomly initialized and refined/learned through the training process. They can be initialized e.g., torch.rand(num_queries, hidden_dim). The learned object queries are passed to an FFN that predicts the class and bbox which are compared with the GT.

KaiserW commented 9 months ago

Thanks for your reply, I was wondering how an input could be learned? I thought the backpropagation only update weights while input values are intact. I'm new to Transformer-like algorithms and any advices would be appreciated.

AjibolaPy commented 9 months ago

Might be late to the party. I've gone true the code. The object query are embedding weights+ zero matrixes. My question/confusion is are they trainable since they are embedding weights?