Reagan1311 / DABNet

Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation (BMVC2019)
https://github.com/Reagan1311/DABNet
MIT License
140 stars 36 forks source link

DownSamplingBlock torch.cat channel doesn't match #24

Closed Emilycs09 closed 4 years ago

Emilycs09 commented 4 years ago

hello,

Your work has been very helpful!

I found a problem in DABNet.py at line 98: output = torch.cat([output, max_pool], 1). As shown, this line concatenates the output of the conv3x3(nIn, nConv, kSize=3, stride=2, padding=1) and MaxPool2d(2, stride=2), but this two outputs could have different dimensions in [h, w].

According to the definition of PyTorch docs, the output dimension of both Conv2d and MaxPool2d are below: image

so the output of conv would be: floor((h-1)/2+1) while the output of pool would be: floor((h-2)/2+1) and this cause breakdown.

Reagan1311 commented 4 years ago

hi, thank you for your remind. This is not a problem, if you set the input size as even number, these two sizes will be the same. however, If you set the crop size as odd number (like 769 x 769), you need to modify this code

Emilycs09 commented 4 years ago

I understand what you mean, but in val or test process, the input images are not cropped and they will have different sizes, both even or odd number are possible. In this case problem happened.

To avoid this, I change your code a little bit like this: original: self.max_pool = nn.MaxPool2d(2, stride=2) edited: self.max_pool = nn.MaxPool2d(3, stride=2, padding=1)

By changing the kernel size of the maxpool layer and add padding, the output of the maxpool and conv will always be the same. But this change the network structure, I'm not sure whether it will affect the perfomance or not.

Reagan1311 commented 4 years ago

Thank you for your suggestion, you can test the performance after the modification. The downsampling block refers to the ENet paper, so the pooling layer size is the same with ENet as 2x2.