Why the output is 4 downsample compared with input image?

HRNet / HRNet-Semantic-Segmentation

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919

Other

3.13k stars 686 forks source link

Why the output is 4 downsample compared with input image? #231

Open zhang-qiang-github opened 3 years ago

zhang-qiang-github commented 3 years ago

If the original image is (H, W), the output of this segmentation is (H/4, W/4). If I want to obtain the segmentation result of (H, W), I need to upsampling the output? Am I right? I think the upsampling would provide a coarse result.

Why don't make the output of the network to be (H, W)? For example, in the last layer network, add a convtranspose layer.