DrSleep / tensorflow-deeplab-resnet

DeepLab-ResNet rebuilt in TensorFlow
MIT License
1.25k stars 429 forks source link

data augmentation compared with the original code #13

Closed gaopeng-eugene closed 7 years ago

gaopeng-eugene commented 7 years ago

Crop and mirrored image are the two techniques used in the original paper to prevent overfitting. I have two questions here. (1) the crop implementation in your repository is the same with original one or not? you are using central crop right. (2) How to implement the mirror augmentation. tf.image.random_flip_left_right can not flip the label and image simultaneously. Another problem with this operator is theat label (W,H,1) will changed into (W,H,3)

can you suggest any method on mirror implementation?

Best Wishes

gaopeng-eugene commented 7 years ago

In my opinion, the original multi scale input is a kind of data augmentation. Without data augmentation strategy, the trained resnet model will suffer from over fitting.

DrSleep commented 7 years ago

1) I am not sure about the original crop implementation. In TF, the tf.image.resize_image_with_crop_or_pad function performs central cropping if the image size is bigger than the crop size. 2) To mirror the image and the label, just add these two lines before splitting the image into the channels: https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/master/deeplab_resnet/image_reader.py#L73

img = tf.image.random_flip_left_right(img, seed=seed)
label = tf.image.random_flip_left_right(label, seed=seed)

The seed is defined above, and since the seeds used are the same, the image and the label will undergo the same transformation. As for the shape of the mask, this is a quote from the TF documentation on image.random_flip_left_right:

Returns:

A 3-D tensor of the same type and shape as image.

3) ImageReader in this repository also allows to perform random scaling on the fly. It is not the same as multi-scale input, but it is another helpful augmentation strategy.