Working with arbitary image size

mrgloom commented 6 years ago

To simplify things let's consider VGG16 based FCN-32s (not U-net):

Code:

def get_fcn_vgg16_32s(inputs, n_classes, h, w):

    x = BatchNormalization()(inputs)

    # Block 1
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(x)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

    x = Conv2D(512, (3, 3), activation='relu', padding="same")(x)

    x = Conv2DTranspose(n_classes, kernel_size=(64, 64), strides=(32, 32), activation='linear', padding='same')(x)

    return x

def get_model():
    inputs = Input((IMAGE_H, IMAGE_W, INPUT_CHANNELS))

    base = models.get_fcn_vgg16_32s(inputs, NUMBER_OF_CLASSES, IMAGE_H, IMAGE_W)

    act = Activation('sigmoid')(base)

    model = Model(inputs=inputs, outputs=act)
    model.compile(optimizer=Adadelta(), loss='binary_crossentropy')

    #print(model.summary())
    #sys.exit()

    return model

Architecture(IMAGE_H = 32, IMAGE_W = 32, INPUT_CHANNELS = 3, NUMBER_OF_CLASSES = 1):

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 3)         12        
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 1, 1, 512)         2359808   
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 32, 32, 1)         2097153   
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 1)         0         
=================================================================
Total params: 19,171,661
Trainable params: 19,171,655
Non-trainable params: 6

VGG16 originally was developed for 224x224 image size, I have tried to train network with synthetic data with different image input size, i.e. ranging IMAGE_H, IMAGE_W from 32x32 to 512x512. Object is ellipse which always fit the image.

Sample generation code:

def gen_random_image():
    img = np.zeros((IMAGE_H, IMAGE_W, INPUT_CHANNELS), dtype=np.uint8)
    mask = np.zeros((IMAGE_H, IMAGE_W, NUMBER_OF_CLASSES), dtype=np.uint8)

    colors = np.random.permutation(256)

    # Background
    img[:, :, 0] = colors[0]
    img[:, :, 1] = colors[1]
    img[:, :, 2] = colors[2]

    # Object class 1
    obj1_color0 = colors[3]
    obj1_color1 = colors[4]
    obj1_color2 = colors[5]
    while(True):
        center_x = rn.randint(0, IMAGE_W)
        center_y = rn.randint(0, IMAGE_H)
        r_x = rn.randint(10, 50)
        r_y = rn.randint(10, 50)
        if(center_x+r_x < IMAGE_W and center_x-r_x > 0 and center_y+r_y < IMAGE_H and center_y-r_y > 0):
            cv2.ellipse(img, (int(center_x), int(center_y)), (int(r_x), int(r_y)), int(0), int(0), int(360), (int(obj1_color0), int(obj1_color1), int(obj1_color2)), int(-1))
            cv2.ellipse(mask, (int(center_x), int(center_y)), (int(r_x), int(r_y)), int(0), int(0), int(360), int(255), int(-1))
            break

    # White noise
    density = rn.uniform(0, 0.1)
    for i in range(IMAGE_H):
        for j in range(IMAGE_W):
            if rn.random() < density:
                img[i, j, 0] = rn.randint(0, 255)
                img[i, j, 1] = rn.randint(0, 255)
                img[i, j, 2] = rn.randint(0, 255)

    return img, mask

Obtained results(1st images is input image, 2nd image is ground truth mask, 3rd image is probability, 4th image is probability thresholded at 0.5):

32x32 binary_segmentation_binary_crossentropy_result32_32 It works and It's little surprising for me that network can reconstruct ellipse shape from blob with 1x1 spartial size.

64x64 binary_segmentation_binary_crossentropy_result64_64 For some reason at this input image size I had to train network for longer time, otherwise it converges for something like this

128x128 binary_segmentation_binary_crossentropy_result128_128 Works well.

256x256 binary_segmentation_binary_crossentropy_result256_256 Works well.

512x512 binary_segmentation_binary_crossentropy_result512_512 Fail, looks like network learned only to predict all zeros.

So my question is is this related to receptive field size? Or is it related to unbalanced class problem (lots of background pixels)? And how to deal with this problem?

ZFTurbo commented 6 years ago

I think problem for 512x512 case can be also with learning rate. Try to decrease it.

mrgloom commented 6 years ago

Yes, it helped. I have switched to Adam and with lr=0.00001 it converges. model.compile(optimizer=Adam(lr=0.00001), loss='binary_crossentropy')

binary_segmentation_binary_crossentropy_result512_512

Can you elaborate on why this helps?

ZFTurbo commented 6 years ago

It's just from my experience. You can check some materials about it: https://www.kdnuggets.com/2017/11/estimating-optimal-learning-rate-deep-neural-network.html

Deep learning rule: if NN must converge but it doesn't, try to reduce learning rate. )

ZFTurbo / ZF_UNET_224_Pretrained_Model

Working with arbitary image size #10