Downsampling segmentation labels vs Upsampling predictions

CSAILVision / semantic-segmentation-pytorch

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

http://sceneparsing.csail.mit.edu/

BSD 3-Clause "New" or "Revised" License

4.96k stars 1.1k forks source link

Downsampling segmentation labels vs Upsampling predictions #209

Open aicaffeinelife opened 5 years ago

aicaffeinelife commented 5 years ago

Hi guys,

Thanks for having this awesome repo. I'm writing some code for semantic segmentation and noted that you are downsampling labels instead of upsampling predictions. This issue is not a code issue, but rather an open ended question:

Downsampling segmentation labels may lead to loss of some spatially correlated pixels, which the NLLLoss may not penalize, so is downsampling justified simply because it leads to less memory consumption?
How would we go about implementing auxiliary losses like SELoss1 or patchwise pixel loss if we downsample segmentation labels?

hangzhaomit commented 4 years ago

This is a good question. Your points in 1. are exactly the trade-offs in this problem. A recent paper tries to solve this problem by sampling more patches around edges, which is effective. https://arxiv.org/abs/1912.08193

aicaffeinelife commented 4 years ago

Thanks for bringing this very interesting paper to my attention. I've not read the paper in depth, but from what I can understand, sampling points drawn from a uniform distribution can lead to better semantic boundaries, without having to upsample the predictions. I believe that this opens up a very new space for exploration in highly efficient semantic segmentation architectures.

What are your thoughts on this?