TobyPDE / FRRN

Full Resolution Residual Networks for Semantic Image Segmentation
MIT License
278 stars 93 forks source link

ValueError while building network for 400x600 images #10

Closed gavinmh closed 7 years ago

gavinmh commented 7 years ago

I'd like to train on 400x600 images, but I am encountering a problem.

ValueError: Mismatch: not all input shapes are the same

is raised in https://github.com/TobyPDE/FRRN/blob/master/dltools/architectures.py#L315 because autobahn.input_shapes = [(None, 32, 600, 400), (None, 32, 592, 400)] Are there any constraints on the input dimensions?

TobyPDE commented 7 years ago

The architecture uses a sequence of pooling and unpooling layers. Each pooling layer reduces the spatial dimensions by a factor of two and each unpooling layer increases them by this factor. This only works if the size of your image is divisible by 2^n where n is the number of pooling layers that you use.

The provided model uses 5 pooling layers. Hence, your image dimensions should be divisible by 32. The easiest way to accomplish this is by padding your images with zeros to a size of 608x416. You can do this using numpy.pad. Alternatively, you can change the architecture to have only three instead of five pooling layers (Training will also be faster in this case).

gavinmh commented 7 years ago

That makes sense. Thanks @TobyPDE for your prompt help.

TobyPDE commented 7 years ago

If you need further assistance with your application, feel free to email me (tobias.pohlen@rwth-aachen.de).

Quick tip: If your dataset fits into the main memory, then I strongly advise you to load the entire set in advance as this might speed up training significantly.