Hi I'm an undergrad student and im using this dataset and code for my final project which is trying to implement CoodConv into this code to do a "location-aware network". the function is here https://github.com/mkocabas/CoordConv-pytorch
on one leg, the function adds location layers ((x,y) coordinates) which helps the network learn the location of objects so it is able to detect these objects easier based on the current location in the image.
after implementing the function into the code and replacing the corresponding nn.conv2d with the CoordConv layers, the results I get are worse than without them, which is counter-intuitive because you would assume the added layers would be considered as noise in the worst case and wouldn't change the base results obtained without it.
my question is- if the data in the image is kept the same before loading it to the model? or for example, is it being cut into several "mini-images", or is there any change in the spatial data of the image? (for instance, if it is cut into mini images, the spatial information isn't kept- so if it is cut to 4 "mini-images" the entire middle part of the original image is now either on the bottom-left, bottom-right, top-right, or top-left)
if it is in fact so, the worsening in performance is somewhat understandable, if not and you think you can shine a light on the issue I'm facing it would be very appreciated!
I tried to illustrate the problem I was talking about here-
Hi I'm an undergrad student and im using this dataset and code for my final project which is trying to implement CoodConv into this code to do a "location-aware network". the function is here https://github.com/mkocabas/CoordConv-pytorch
on one leg, the function adds location layers ((x,y) coordinates) which helps the network learn the location of objects so it is able to detect these objects easier based on the current location in the image.
after implementing the function into the code and replacing the corresponding nn.conv2d with the CoordConv layers, the results I get are worse than without them, which is counter-intuitive because you would assume the added layers would be considered as noise in the worst case and wouldn't change the base results obtained without it.
my question is- if the data in the image is kept the same before loading it to the model? or for example, is it being cut into several "mini-images", or is there any change in the spatial data of the image? (for instance, if it is cut into mini images, the spatial information isn't kept- so if it is cut to 4 "mini-images" the entire middle part of the original image is now either on the bottom-left, bottom-right, top-right, or top-left)
if it is in fact so, the worsening in performance is somewhat understandable, if not and you think you can shine a light on the issue I'm facing it would be very appreciated!
I tried to illustrate the problem I was talking about here-
thanks a lot for your help!