Open hitbuyi opened 1 year ago
Object queries are the inputs to the decoder layer. They are randomly initialized and refined/learned through the training process. They can be initialized e.g., torch.rand(num_queries, hidden_dim)
. The learned object queries are passed to an FFN that predicts the class and bbox which are compared with the GT.
Thanks for your reply, I was wondering how an input could be learned? I thought the backpropagation only update weights while input values are intact. I'm new to Transformer-like algorithms and any advices would be appreciated.
Might be late to the party. I've gone true the code. The object query are embedding weights+ zero matrixes. My question/confusion is are they trainable since they are embedding weights?
It hard to under this concept? I have some questions on it 1) How to obtain object query? is it from the image? who is responsible to design it? 2) What does it look like? is it a vector? 3) During training, how to use object query? What is relationship between GT and object query?