Currently, we are wasting a lot of time on processing images, that are much larger than the objects, that we are interested in. Make the input variable sized to only contain the objects of interest.
Reason: For simplicity, we were limiting ourselves to a fixed image size. If we used Adaptive Global Average Pooling in the network, we could accept images of arbitrary size and therefore crop the images to only contain the two objects of interest (plus a little margin) instead of a fixed crop.
Currently, we are wasting a lot of time on processing images, that are much larger than the objects, that we are interested in. Make the input variable sized to only contain the objects of interest.
Reason: For simplicity, we were limiting ourselves to a fixed image size. If we used Adaptive Global Average Pooling in the network, we could accept images of arbitrary size and therefore crop the images to only contain the two objects of interest (plus a little margin) instead of a fixed crop.