Closed Jianqiuer closed 2 years ago
You can refer to Conditional DETR or DAB DETR, in which decoder queries are specified as a content part and a positional part. Therefore, we can set decoder content embedding as class label embedding as they are both related to content features. Please feel free to ask any question that helps you understand this paper. Thank you.
Thanks for your reply. I have two questions about this part. How the class label embedding specifies to a class label? Is label noising implementing for the content part of query?
Yes, label noising is implemented for the content part. For the first question, you can use label embedding to embed a class label, just like word embedding in NLP. Thank you.
Thank you, It inspires me a lot.
You can refer to Conditional DETR or DAB DETR, in which decoder queries are specified as a content part and a positional part. Therefore, we can set decoder content embedding as class label embedding as they are both related to content features. Please feel free to ask any question that helps you understand this paper. Thank you.
Hi, thanks for the amazing work! May I ask where does the "class label embedding" comes from? To my understanding, the weight of each class in the classifier (for final prediction) is used to initialize the content embedding of corresponding ground-truth class (for denoising input). Is that right?
No. We add a new linear layer to embed the class labels.
No. We add a new linear layer to embed the class labels.
OK, I get it. Thanks for your explanation~
I am still a bit confused as how the class label embeddings are obtained. Do you mean, say, for a one-hot vector (0,0,0,1,0...), you pass it through an MLP to obtain a feature vector, which is then served as the content query for the decoder part?
Thanks for your excellent work. Could you give more details how decoder embedding is specified as class label embedding?