Atten4Vis / ConditionalDETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence". (https://arxiv.org/abs/2108.06152)
Apache License 2.0
358 stars 48 forks source link

questions about provided conditional detr model #31

Closed xz-123-new closed 1 year ago

xz-123-new commented 1 year ago

Thanks for your excellent work! I have questions about your provided model.In the provided conditional detr model"conditional detr resnet50",the transformer.decoder.layer.cross_attn.out_proj.weight/bias is of dimension of 256x256 and 256 seperately,but since the input of this cross attention is the concatenation of two 256-d query, it seems should be 512x512 and 512.It really confuses me.Looking forward to your help,thanks!

charlesCXK commented 1 year ago

Hi, the function out_proj is applied to the value (https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L317) which is 256-d.

xz-123-new commented 1 year ago

Sorry to disturb you again.My question is about the out_proj of cross_attn,i.e. self.cross_attn = nn.MultiheadAttention(d_model * 2, nhead, dropout=dropout, vdim=d_model) instead of your mentioned out_proj .In the source code of nn.MultiheadAttention,the out_proj is set as self.out_proj = NonDynamicallyQuantizableLinear(embed_dim, embed_dim, bias=bias, *factory_kwargs), where in your code, the embed_dim is set as d_model2,i.e. 512,so i think the out_proj seems should be 512-d instead of 256-d,but the model provided is all 256-d.

xz-123-new commented 1 year ago

sorry i wrongly import Multiheadattention from torch.nn instead of your modified version.Now the problem is solved,thanks.