The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Other
3.16k
stars
690
forks
source link
The effect of bilinear upsampling for the final segmentation stage #138
Hi, in your HRNet, the prediction size of the output segmentation map is 1/4 of the raw image, then bilinear upsampling is adopted to generate the final segmentaiton map. I am wondering why not generate the output map same size as the raw image, since upsampling operation may bring many spatial errors. Is it the GPU memory issue?
You are right.
If operating the convs on the features with the original size, not only the GPU memory cost but also the computation complexity are very high. We have not tried it.
Hi, in your HRNet, the prediction size of the output segmentation map is 1/4 of the raw image, then bilinear upsampling is adopted to generate the final segmentaiton map. I am wondering why not generate the output map same size as the raw image, since upsampling operation may bring many spatial errors. Is it the GPU memory issue?