Open ehdrndd opened 1 year ago
I'm going deeper in code. And i find something weird
See decoder input. There are object query going direct to value.
but In DecoderLayer code.
q = k = self.with_pos_embed(tgt, query_pos) tgt2 = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask,key_padding_mask=tgt_key_padding_mask)[0] tgt = tgt + self.dropout1(tgt2) tgt = self.norm1(tgt) tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos), key=self.with_pos_embed(memory, pos), value=memory, attn_mask=memory_mask, key_padding_mask=memory_key_padding_mask)[0] tgt = tgt + self.dropout2(tgt2) tgt = self.norm2(tgt) tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt)))) tgt = tgt + self.dropout3(tgt2) tgt = self.norm3(tgt)
decoder first layer tgt is 0 Tensor. size=(num_quries=100, batch_size, hidden_dim=256) query_pos is object query...
I have same confusion about this issue.
@ehdrndd On the last page of the original paper, they give a simple code of DETR, but the decoder's input is just a random value of size (100, 256).
I'm going deeper in code. And i find something weird
See decoder input. There are object query going direct to value.
but In DecoderLayer code.
decoder first layer tgt is 0 Tensor. size=(num_quries=100, batch_size, hidden_dim=256) query_pos is object query...