feevos / resuneta

mxnet source code for the resuneta semantic segmentation models
Other
127 stars 30 forks source link

about PSPooling,according to paper,the order is split, pooling, restore dimension and concate #2

Closed ChienWong closed 4 years ago

ChienWong commented 4 years ago

I don't know how to restore dimension,Accodrding to PSPNet,I haven't use split,and directly pooling,then conv(1,1),upsamping,concate. lastly,the result is ok. Can you tell me how to work out the paper solution and the effect of result?

feevos commented 4 years ago

Hi @mohuazheliu, I am not sure I understand completely your question(s). Dimension in height, width is restored by direct upsampling within the psp pooling operator, as can be seen in line 63.

        p = [_input]
        for i in range(4): # four different pooling operations 

            pool_size = layer_size // (2**i) # Need this to be integer 
            x = F.Pooling(_input,kernel=[pool_size,pool_size],stride=[pool_size,pool_size],pool_type='max') # this performs pooling
            x = F.UpSampling(x,sample_type='nearest',scale=pool_size) # this restores in original height, width 
            x = self.convs[i](x) # this reduces the number of channels in a quarter of the original
            p += [x] # this adds the created output in the initial list 

        out = F.concat(p[0],p[1],p[2],p[3],p[4],dim=1) # this concatenates all pooling layers, thus number of channels is double the initial number

        out = self.conv_norm_final(out) # this bring the number of channels equal to the original input number of channels.

hope this helps. I don't understand what do you mean by the paper solution, or effect of result. Can you please elaborate?

feevos commented 4 years ago

@mohuazheliu I was looking again at the manuscript, in particular Figure 1.c. There is a typographic mistake in the figure 1.c, that I will correct. Thank you for this. There is no splitting in channel space initially. The pooling happens in all channels, but at different strides. Then the Upsampling process restores the spsatial dimensions (height, width). Then the convolution reduces the number of filters to a 1/4 of the initial number. Then all these are concatenated with the original input (therefore in total, twice as many filters/channels) and then a convolution reduces the number of channels to the original input number of channels.

Thank you for spotting this I will correct the figure.