Closed gavinmh closed 7 years ago
The architecture uses a sequence of pooling and unpooling layers. Each pooling layer reduces the spatial dimensions by a factor of two and each unpooling layer increases them by this factor. This only works if the size of your image is divisible by 2^n where n is the number of pooling layers that you use.
The provided model uses 5 pooling layers. Hence, your image dimensions should be divisible by 32. The easiest way to accomplish this is by padding your images with zeros to a size of 608x416. You can do this using numpy.pad. Alternatively, you can change the architecture to have only three instead of five pooling layers (Training will also be faster in this case).
That makes sense. Thanks @TobyPDE for your prompt help.
If you need further assistance with your application, feel free to email me (tobias.pohlen@rwth-aachen.de).
Quick tip: If your dataset fits into the main memory, then I strongly advise you to load the entire set in advance as this might speed up training significantly.
I'd like to train on 400x600 images, but I am encountering a problem.
is raised in https://github.com/TobyPDE/FRRN/blob/master/dltools/architectures.py#L315 because
autobahn.input_shapes = [(None, 32, 600, 400), (None, 32, 592, 400)]
Are there any constraints on the input dimensions?