Closed henriquepm closed 1 year ago
Thanks for these questions.
Thanks for the quick answer, I do not know atm, I'm planning to run some experiments with the backbones and wanted to understand the departure point as well as possible.
@henriquepm I'm coming back to this to check the /4 and /8 stuff. I added a bunch of shape prints to the forward
of Encoder_res101
, and right now I'm not sure why you said "the approach in the code will produce FM of dimension C x H/4 x W/4".
def forward(self, x):
print('x in', x.shape)
x1 = self.backbone(x)
print('x1', x1.shape)
x2 = self.layer3(x1)
print('x2', x2.shape)
x = self.upsampling_layer(x2, x1)
print('x up', x.shape)
x = self.depth_layer(x)
print('x d', x.shape)
return x
The output is:
x in torch.Size([6, 3, 448, 800])
x1 torch.Size([6, 512, 56, 100])
x2 torch.Size([6, 1024, 28, 50])
x up torch.Size([6, 512, 56, 100])
x d torch.Size([6, 128, 56, 100])
which looks like H/8, W/8 like the paper said. I may easily have missed something because I haven't used the repo in a little bit, so please let me know if you see something wrong.
Hey, that looks totally right, sorry about that. I took a look at the notebook where I was looking at the network and dissecting it. I was comparing the size against the output of the first conv layer of the resnet instead of the proper input so I was missing a 1/2 factor.
Perfect, no problem. Thanks for confirming so quickly!
Hi! First of all thank you for the great quality of this work, both the paper and the code. I have a couple of doubts regarding the backbone: