CSAILVision / semantic-segmentation-pytorch

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset
http://sceneparsing.csail.mit.edu/
BSD 3-Clause "New" or "Revised" License
4.95k stars 1.1k forks source link

trying to implement CoordConv and getting worse results #250

Open YossiSella opened 4 years ago

YossiSella commented 4 years ago

Hi I'm an undergrad student and im using this dataset and code for my final project which is trying to implement CoodConv into this code to do a "location-aware network". the function is here https://github.com/mkocabas/CoordConv-pytorch

on one leg, the function adds location layers ((x,y) coordinates) which helps the network learn the location of objects so it is able to detect these objects easier based on the current location in the image.

after implementing the function into the code and replacing the corresponding nn.conv2d with the CoordConv layers, the results I get are worse than without them, which is counter-intuitive because you would assume the added layers would be considered as noise in the worst case and wouldn't change the base results obtained without it.

my question is- if the data in the image is kept the same before loading it to the model? or for example, is it being cut into several "mini-images", or is there any change in the spatial data of the image? (for instance, if it is cut into mini images, the spatial information isn't kept- so if it is cut to 4 "mini-images" the entire middle part of the original image is now either on the bottom-left, bottom-right, top-right, or top-left)

if it is in fact so, the worsening in performance is somewhat understandable, if not and you think you can shine a light on the issue I'm facing it would be very appreciated!

I tried to illustrate the problem I was talking about here- git question

thanks a lot for your help!

hangzhaomit commented 4 years ago

You might want to take a look at how data augmentation is done: https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/master/mit_semseg/dataset.py#L110