Closed alesolano closed 4 years ago
You shouldn't have the spatial dimensions for each image region. Did you try to take the output of the average pooling?
The cls_prob
tensor should have a shape equal to (N, 1601)
where 1600 (plus 1 for the background) is the number of possible detection classes.
@marcellacornia thank Marcellacornia for the information. @alesolano, I wrote these lines of code, and it worked fine when I attached it with the M2 model. Please try it if you need:
# Original Resnet50
resnet = resnet50(pretrained=True)
# Remove linear and pool layers (since we're not doing classification)
modules = list(resnet.children())[:-2]
self.resnet = nn.Sequential(*modules)
self.dropout = nn.Dropout(0.5)
# Resize image to fixed size to allow input images of variable size
self.adaptive_pool = nn.AdaptiveAvgPool2d((encoded_image_size, encoded_image_size))
# Add layer
self.avgpool = nn.AvgPool2d(encoded_image_size)
self.affine_embed = nn.Linear(2048, embed_dim)
self.conv_1d = nn.Conv2d(2048, embed_dim, kernel_size=1)
`
Thanks @marcellacornia, @TranTony for the responses! I'll try that last snippet of code tomorrow and I'll post here the outcome so we can eventually close this issue.
UPDATE: Yes @marcellacornia, that's exactly what I needed. The pool5_flat
layer has an output of Nx2048
. Thanks!
First of all, congrats for your work and thanks for releasing the code! 😄
Following #2 and #5, I'm trying to run the network on a new set of images. To get the image features I went to the bottom-up attention repo you suggested here, using the Faster-R-CNN-ResNet101 model with these weights.
My problem is the following: how to transform the outputs of this feature extractor into the format you require?
Following the Readme and code, I understand that you need to express the features as a
Nx2048
tensor. Following this line, I understand that you also need acls_prob
vector to sort your feature vector.Now, I took the blob
res5c
for the features andcls_prob
for the probabilities, but the dimensions are not quite as I expected.res5c
has dimensionNx2048x14x14
, so the14x14
should be mapped into one number I guess. Andcls_prob
hasNx1061
which is not coherent with the rest.Am I missing something?
Thanks!