aleju / imgaug

Image augmentation for machine learning experiments.
http://imgaug.readthedocs.io
MIT License
14.42k stars 2.44k forks source link

Affine augmentation of segmentation map destroys segmentation map format #342

Open lewlin opened 5 years ago

lewlin commented 5 years ago

Hi all,

thank you very (very) much @aleju for the awesome library - really comprehensive and well documented.

I am running into trouble when augmenting segmentation maps. My segmentation maps have size (H, W, C), with C number of classes. The dtype of the map is float and the only entries I use are 0. and 1. This is compatible with SegmentationMapOnImage documentation. However, when I run an affine transformation on the segmentation map (eg a rotation), the resulting augmented map has all sort of value in it, not only zero and one.

For example see code below:

import imgaug as ia
from imgaug import augmenters as iaa
assert ia.__version__ == '0.2.9'
# Generate a dummy image with all zero and corresponding
# segmentation map volume with 5 classes
mask = np.zeros(shape=(10, 10, 5), dtype='float32')  # 5 classes
image = np.zeros(shape=(10, 10, 3))  # RGB image
# Assign random values to segmentation map
ixs = np.random.randint(low=0, high=5, size=(10, 10))
for row in range(10):
    for col in range(10):
        channel = ixs[row, col]
        mask[row, col, channel] = 1
# Define affine augmenter
augmenter = iaa.Affine(rotate=(-20, 20))
# Augment mask
smap = ia.augmentables.segmaps.SegmentationMapOnImage(mask, 
                                                      shape=image.shape)
smap_aug = augmenter.augment_segmentation_maps(seg_map)
mask_aug = smap_aug.arr

To be a valid segmentation map, the sum along channels should be one for every pixel. Indeed this is what happens for the original map:

np.sum(mask, axis=-1)

for the augmented version this is not true anymore:

np.sum(mask_aug, axis=-1)

Do you think this might be a bug?

aleju commented 5 years ago

In the current version of the library the segmentation maps are represented as float arrays, where the last axis is comparable to one-hot-vectors, one per pixel. The sum of each such one-hot-vector does not necessarily have to be 1.0 as we only care about the maximum anyways. I.e. no normalization is required or performed. The rule is only that no component of each vector is supposed to be outside of the interval [0.0, 1.0] and that the argmax along each vector denotes the class index. There is a function -- I think it was called smap.get_arr_int() -- which returns that discrete class index representation, i.e. what people usually consider an (integer-based) segmentation map. In most cases it is sensible to not access .arr and instead use that function.

The system is currently quite confusing (and also inefficient), which is why it is going to be replaced by integer maps in the next version of the library.