Pongpisit-Thanasutives / Variations-of-SFANet-for-Crowd-Counting

The official implementation of "Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting"
https://ieeexplore.ieee.org/document/9413286
GNU General Public License v3.0
110 stars 32 forks source link

Inconsistent shapes of conv4, conv5,feature in Backend()'s forwarding in MSegNet #20

Closed Mozijie255 closed 2 years ago

Mozijie255 commented 3 years ago

1622190972(1) 1622190951(1)

Anybody come across this problem while using random input picture to generate density map through M-SegNet? why the shape[2] of these 3 tensor differ?

Mozijie255 commented 3 years ago

I think I got it...it seems that the input img's size must be able to be divided by 16 due to the setting of downsampling, otherwise there will be some inconsistency during upsampling

Pongpisit-Thanasutives commented 3 years ago

@Mozijie255 Yes, you're right! It depends on how many times the max-pooling operation is used in the encoder part. So, 4 times => the image size must be divided by 2^4.