Open Wangzs0228 opened 3 years ago
Same question here.
This is from the original paper. However, it seems that such logic is not implemented in current code base.
In the first stage, given the output feature maps of the encoder, a detection head is applied to each pixel. The detection head is of a 3-layer FFN for bounding box regression, and a linear projection for bounding box binary classification (i.e., foreground and background), respectively.
Same question. It seems that the num_classes
in self.decoder.class_embed
is the same as the whole dataset classes
Although they use sigmoid+BCE loss, I doubt if it is reasonable to judge a foreground only by the score in the first category.
why 0 in this code?