IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.15k stars 232 forks source link

Last layer output #160

Open anjugopinath opened 1 year ago

anjugopinath commented 1 year ago

I am trying to get the output of the last layer using pytorch hooks. I am getting an output of dimension 900 by 91. I thought I should be getting an output of dimension 256 by 91.

This is the last layer I see when I execute model.children

image

Also, what is class_embed and why does it have 6 layers that look the same?

image
HaoZhang534 commented 1 year ago

I am trying to get the output of the last layer using pytorch hooks. I am getting an output of dimension 900 by 91. I thought I should be getting an output of dimension 256 by 91.

This is the last layer I see when I execute model.children image

Also, what is class_embed and why does it have 6 layers that look the same? image

Which output variable did you get? I think 900 by 91 is the output_logits. class_embed in 6 decoder layers share parameters, so they are the same.

anjugopinath commented 1 year ago

I am trying to visualize the model architecture.

In class_embed, (0) is of dimension 256 by 91, and if it's feeding into (1) of class_embed, shouldn't the first dimension be 91?

So, I realize (0) of class_embed is not actually feeding into (1) of class_embed. So, could you explain this part of the architecture to me?

image

Also, the last layer of MLP has dimension 256 by 4. So, shouldn't the first dimension of class_embed be having a size of 4 ?