GeorgeSeif / Semantic-Segmentation-Suite

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
2.5k stars 880 forks source link

About RefineNet detail #162

Closed Spritea closed 5 years ago

Spritea commented 5 years ago

Information

Please specify the following information when submitting an issue:

Hi, George! I'm here again~ Seems you are busy these days~

I read the RefineNet code carefully these days, and I found there is a part that I don't get the reason. That is when you get pool2 to pool5 outputs in ResNet, you change their channel to 256, 256, 256, 512. For the original paper, Fig.2. (a) and (c) show that after every pooling ops, the channel number of feature map doubles. It's consistent with original ResNet paper(see Table 1.), which is 256, 512, 1024, 2048, from pool2 to pool5.

So I'm curious about why we need to change the channel number here, and make 3 of them the same.

BTW, I found another implement here which does similar things, although it changes all 4 layers' channel into 256. Small difference, huh~

Thanks~

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.


   high = [end_points['pool5'], end_points['pool4'],end_points['pool3'], end_points['pool2']]
   low = [None, None, None, None]
    # Get the feature maps to the proper size with bottleneck
   high[0]=slim.conv2d(high[0], 512, 1)
   high[1]=slim.conv2d(high[1], 256, 1)
   high[2]=slim.conv2d(high[2], 256, 1)
   high[3]=slim.conv2d(high[3], 256, 1)
lukasuz commented 5 years ago

Hey there! I have stumbled upon this as well, after re-implementing the RefineNet. The authors from the paper state in 3.2. in the Paragraph about Residual convolution unit: "The filter number for each input path is set to 512 for RefineNet-4 and 256 for the remaining ones in our experiments." 1. If you used the endpoint directly from the ResNet without the 1x1 convolution the dimension would not match and therefore summation of the layers would not be possible. And as you can see from the quote RefineNet block 4 is supposed to have twice the layers than the others. But I could not find this explicit dimension adaption in the paper either, but I guess they must have done it the same way.

Spritea commented 5 years ago

Nice catch! @lukasuz

Thanks.