Closed gaopeng-eugene closed 7 years ago
In my opinion, the original multi scale input is a kind of data augmentation. Without data augmentation strategy, the trained resnet model will suffer from over fitting.
1) I am not sure about the original crop implementation. In TF, the tf.image.resize_image_with_crop_or_pad
function performs central cropping if the image size is bigger than the crop size.
2) To mirror the image and the label, just add these two lines before splitting the image into the channels: https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/master/deeplab_resnet/image_reader.py#L73
img = tf.image.random_flip_left_right(img, seed=seed)
label = tf.image.random_flip_left_right(label, seed=seed)
The seed is defined above, and since the seeds used are the same, the image and the label will undergo the same transformation.
As for the shape of the mask, this is a quote from the TF documentation on image.random_flip_left_right
:
Returns:
A 3-D tensor of the same type and shape as image.
3) ImageReader in this repository also allows to perform random scaling on the fly. It is not the same as multi-scale input, but it is another helpful augmentation strategy.
Crop and mirrored image are the two techniques used in the original paper to prevent overfitting. I have two questions here. (1) the crop implementation in your repository is the same with original one or not? you are using central crop right. (2) How to implement the mirror augmentation. tf.image.random_flip_left_right can not flip the label and image simultaneously. Another problem with this operator is theat label (W,H,1) will changed into (W,H,3)
can you suggest any method on mirror implementation?
Best Wishes