Training with arbitrary sized images

TillChima commented 5 years ago

Hello there, I am trying to train with arbitrary sized images. As mentioned by @tcwang0509 in https://github.com/NVIDIA/pix2pixHD/issues/52#issuecomment-414005406_ Concerning a similar network the sizes need to be divisible by 32. As far as I notized, this is also the case using this ENet implementation.

Why is this the case?

How can this be handled? In Training that is not a problem to me, but when it comes to testing and validating my models, I simply cannot resize or random crop my test / val images, because I would like to compare the results. For example I am training the model on PascalVOC 2012 with the images being resized to 480x480, but when it comes to testing there are arbitrary sizes. Is there any way of making such a network compatible to arbitrary sized images?

I stumbled across this comment https://github.com/meetshah1995/pytorch-semseg/issues/43#issuecomment-357550284_ concering the UNet which insists on using padding of 1. But so far I have not had success on the ENet. Should a padding be added to each of the modules in the encoder and decoder?

Desperately searching for help ;) Thanks in advance!

davidtvs commented 5 years ago

Sorry for the late reply.

Since this is a fully convolutional neural network It should work with arbitrary image sizes. Is there a particular image dimension that you've tried that gives you an error? If so, can you post the error message here?

TillChima commented 5 years ago

Thanks for the reply :) I have made a mistake concerning the image size. I am able to test it the same way I am training it. But I still don't get why the image resolution needs to be a multiple of 32. I can make it work using a resolution of 480x480, but when I want to use 500x500 I get the following error:

` File "train.py", line 386, in train prediction = model(images)

File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs)

File "/home/usr/Documents/src/models/e_net.py", line 633, in forward x, max_indices2_0 = self.downsample2_0(x)

File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs)

File "/home/usr/Documents/src/models/e_net.py", line 376, in forward main = torch.cat((main, padding), 1)

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 62 and 63 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:111 `

Why does it have to be divisible by 32?

davidtvs commented 5 years ago

After looking into this a little more, the original implementation (https://github.com/e-lab/ENet-training) doesn't generalize to all input sizes. Since I followed the original work, my implementation also can't handle all input sizes properly.

I've made the necessary changes to the architecture so that it can handle all input sizes. The code can be found in the any_input_size branch.

Note that the pretrained weights cannot be used since the network architecture doesn't match. I currently don't have access to my desktop computer therefore can't test the network performance and can't provide pretrained weights for the architecture in that branch.

davidtvs / PyTorch-ENet

Training with arbitrary sized images #17