IDEA-Research / DN-DETR

[CVPR 2022 Oral] Official implementation of DN-DETR
Apache License 2.0
541 stars 62 forks source link

How class embedding is implemented? #3

Closed Jianqiuer closed 2 years ago

Jianqiuer commented 2 years ago

Thanks for your excellent work. Could you give more details how decoder embedding is specified as class label embedding?

FengLi-ust commented 2 years ago

You can refer to Conditional DETR or DAB DETR, in which decoder queries are specified as a content part and a positional part. Therefore, we can set decoder content embedding as class label embedding as they are both related to content features. Please feel free to ask any question that helps you understand this paper. Thank you.

Jianqiuer commented 2 years ago

Thanks for your reply. I have two questions about this part. How the class label embedding specifies to a class label? Is label noising implementing for the content part of query?

FengLi-ust commented 2 years ago

Yes, label noising is implemented for the content part. For the first question, you can use label embedding to embed a class label, just like word embedding in NLP. Thank you.

Jianqiuer commented 2 years ago

Thank you, It inspires me a lot.

encounter1997 commented 2 years ago

You can refer to Conditional DETR or DAB DETR, in which decoder queries are specified as a content part and a positional part. Therefore, we can set decoder content embedding as class label embedding as they are both related to content features. Please feel free to ask any question that helps you understand this paper. Thank you.

Hi, thanks for the amazing work! May I ask where does the "class label embedding" comes from? To my understanding, the weight of each class in the classifier (for final prediction) is used to initialize the content embedding of corresponding ground-truth class (for denoising input). Is that right?

FengLi-ust commented 2 years ago

No. We add a new linear layer to embed the class labels.

encounter1997 commented 2 years ago

No. We add a new linear layer to embed the class labels.

OK, I get it. Thanks for your explanation~

YellowPig-zp commented 2 years ago

I am still a bit confused as how the class label embeddings are obtained. Do you mean, say, for a one-hot vector (0,0,0,1,0...), you pass it through an MLP to obtain a feature vector, which is then served as the content query for the decoder part?