How to do rotation argumentation on COCO dataset

jwwangchn commented 6 years ago

COCO dataset labels object by bounding box and instance segmentation mask. I want to do rotation argumentation for both image, bounding box and instance segmentation. So How to do these. Thanks.

aleju commented 6 years ago

There are functions in the library to augment images, segmentation maps and bounding boxes. You will have to convert the segmentation maps to imgaug.SegmentationMapOnImage and the bounding boxes to imgaug.BoundingBoxesOnImage. Rotation can be done via Affine(rotate=(lower bound in degrees, upper bound in degrees)). You have to call to_deterministic() on the augmenter before augmenting images, segmentation maps and bounding boxes to achieve the same rotation (in degrees) over all input types per image.

The rough outline of the training loop is something like:

rot = iaa.Affine(rotate=(-30, 30))
for batch in batches:
    rot_det = rot.to_deterministic()
    images = batch.images  # should return e.g. a list of uint8 numpy arrays of shape (H, W, C)
    segmaps = batch.segmaps  # should return a list of imgaug.SegmentationMapOnImage
    bbs = batch.bbs  # should return a list of imgaug.BoundingBoxesOnImage
    images_aug = rot_det.augment_images(images)
    segmaps_aug = rot_det.augment_segmentation_maps(segmaps)
    bbs_aug = rot_det.augment_bounding_boxes(bbs)
    # train command here

jwwangchn commented 6 years ago

@aleju Thanks for your answer, but In COCO dataset, the instance segmentation mask is polygon format which uses some points which are located on mask boundary to represent the mask rather than segmentation map. I think this format mask is similar with the bounding box which also uses four points to represent. So I want to know how to rotate this format mask?

aleju commented 6 years ago

The library does not yet support direct augmentation of polygons, as that is fairly hard to implement. You can transform the polygon to a list of imgaug.Keypoint objects, then use augmenter.augment_keypoints() to transform these and at the end recreate your polygon (that's how bounding box augmentation works). That will work with e.g. Affine and I think PerspectiveTransform (and obviously with all augmenters not affecting the geometry of images). But it might not work for PiecewiseAffine or ElasticTransformation, because the new locations of keypoints might be such that you cannot create a concave polygon from them (i.e. a polygon that does not intersect itself). The alternative to the keypoint-based augmentation is to convert the polygons to segmentation maps or masks. You can convert the coco polygons to imgaug.Polygon([(x, y), (x, y), ...]) and then use that object's Polygon.draw_on_image(np.zeros((*image.shape[0:2], 3), dtype=np.uint8), color=[255, 255, 255], color_perimeter=[255, 255, 255]) to create a mask for that polygon with the same size as an image given by image. Then you can create a segmentation map object via imgaug.SegmentationMapOnImage(mask / 255.0, shape=image.shape). The segmentation map can then be augmented via augmenter.augment_segmentation_maps(segmaps). Recreating a polygon from that augmented mask wouldn't be completely trivial, but depending on how your model works you might be operating on a mask/map level anyways and hence not even need that.

anguoyang commented 5 years ago

@aleju how about voc dataset, with only boxes

aleju commented 5 years ago

Do you mean bounding boxes? In that case, see here for documentation on that.

summerela commented 5 years ago

I've created a script to augment and rotate both the images and their masks; is that what you're looking for?

If so, my directories are structures as: image_name/images/image_name.png image_name/masks/mask_name.png

This script creates three new rotations and three new augmentations for each input image. There are up to three possible augmentations per output image, as more than this has been shown to introduce noise. For rotations, masks are adjusted accordingly, for augmentations, the masks are left unmodified. You can easily adjust as needed.

I have not yet parallelized or optimized, but happy to hear suggestions.

It should be easy to modify the directory structure to match your own with the following code:

from imgaug import augmenters as iaa
import numpy as np
import imageio
import os
import argparse

# To Run: 
# python3.6 augment.py -id /path/to/img/folder -od /path_to_output_dir > ./augmentation.log &
#
parser = argparse.ArgumentParser(description='Create an augmented data set.')
parser.add_argument('-id', '--input_dir', type=str, help='Base directory of the images to be augmented.', required=True)
parser.add_argument('-od', '--output_dir', type=str, help='Base directory of the output.', required=True)
args = parser.parse_args()

# store images and related masks as dictionary
img_dict = {}

for image_name in os.listdir(args.input_dir):

    # get abs path to image
    img = "{}.png".format(os.path.join(os.path.abspath(args.input_dir), image_name, "images", image_name))

   # create list of mask files for image
    mask_file_list = []
    mask_folder = os.path.join(os.path.abspath(args.input_dir), image_name, "masks")
    for mask in os.listdir(mask_folder):
        mask_path = os.path.join(os.path.abspath(mask_folder), mask)
        mask_file_list.append(mask_path)

    img_dict[image_name] = {
        "image":img,
        "masks":mask_file_list
    }

# create the augmentation sequences
rotators = iaa.SomeOf((1,3), [
    iaa.Fliplr(1),  # flip horiozontally
    iaa.Flipud(1),  # Flip/mirror input images vertically
    iaa.Rot90(1)
    ], random_order=True)

transformers = iaa.SomeOf((1, 3), [
    iaa.Superpixels(p_replace=0.5, n_segments=64),  # create superpixel representation
    iaa.GaussianBlur(sigma=(0.0, 5.0)),
    iaa.AverageBlur(k=(2, 7)), # blur image using local means with kernel sizes between 2 and 7
    iaa.MedianBlur(k=(3, 11)), # blur image using local medians with kernel sizes between 2 and 7
    iaa.ElasticTransformation(alpha=(0, 5.0), sigma=0.25), # distort pixels
    # either change the brightness of the whole image (sometimes
    # per channel) or change the brightness of subareas
    iaa.OneOf([
        iaa.Multiply((0.5, 1.5), per_channel=0.5),
        iaa.FrequencyNoiseAlpha(
            exponent=(-4, 0),
            first=iaa.Multiply((0.5, 1.5), per_channel=True),
            second=iaa.ContrastNormalization((0.5, 2.0))
        )
    ])
], random_order=True)

# go through all the images and create a set of augmented images and masks for each
for key, val in img_dict.items():

    img = val["image"]
    masks = val["masks"]
    base_masks = []

    # read in images and masks
    base_image = np.array(imageio.imread(img).astype(np.uint8))
    for mask in masks:
        out_mask = imageio.imread(mask).astype(np.uint8)
        base_masks.append(out_mask)

    # loop through and rotate each image three times
    counter = 0
    while counter < 4:

        # create transforms to image and masks
        rotator = rotators.to_deterministic()  # set same random augmentation for img and masks
        images_aug = rotator.augment_image(base_image) # rotate image
        masks_aug = rotator.augment_images(base_masks) # rotate matching masks

        # update counter
        counter += 1

        # create output directories
        output_dir = "{}/{}_rot{}".format(args.output_dir, key, counter)
        out_img_dir = "{}/{}".format(output_dir, "images")
        out_mask_dir = "{}/{}".format(output_dir, "masks")

        # Create results directories
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

        if not os.path.exists(out_img_dir):
            os.makedirs(out_img_dir)

        if not os.path.exists(out_mask_dir):
            os.makedirs(out_mask_dir)

        # save augmented image
        img_out = "{}/{}_rot{}.png".format(out_img_dir, key, counter)
        imageio.imwrite(img_out, images_aug)

        for i, mask in enumerate(masks_aug):
            mask_out = "{}/{}_rot{}_{}.png".format(out_mask_dir, key, counter, i)
            imageio.imwrite(mask_out, masks_aug[i])

    # loop through and augment each image three times, leaving masks untouched
    counter2 = 0
    while counter2 < 4:

        # create transforms to image and masks
        augmentor = transformers.to_deterministic()  # set same random augmentation for img
        images_aug = augmentor.augment_image(base_image)

        # update counter
        counter2 += 1

        # create output directories
        output_dir2 = "{}/{}_aug{}".format(args.output_dir, key, counter2)
        out_img_dir2 = "{}/{}".format(output_dir2, "images")
        out_mask_dir2 = "{}/{}".format(output_dir2, "masks")

        # Create results directories
        if not os.path.exists(output_dir2):
            os.makedirs(output_dir2)

        if not os.path.exists(out_img_dir2):
            os.makedirs(out_img_dir2)

        if not os.path.exists(out_mask_dir2):
            os.makedirs(out_mask_dir2)

        # save augmented image
        img_out = "{}/{}_aug{}.png".format(out_img_dir2, key, counter2)
        imageio.imwrite(img_out, images_aug)

        for i, mask in enumerate(base_masks):
            mask_out = "{}/{}_aug{}_{}.png".format(out_mask_dir2, key, counter2, i)
            imageio.imwrite(mask_out, base_masks[i])

kame-hameha commented 5 years ago

Hey @summerela, is there a reason why you do not apply rotations+transforms in one go:

3 separate rotations and 3 transformations = 6 augmentations, but
3 rotations+transformations = 3 augmentations

2) Do you have a reference for the introduction of noise for more than 3 augmentations?

summerela commented 5 years ago

Hello there!

I do the rotations and transforms in separate steps because I want to rotate the masks along with the images, but I do not want to appy tranformations to the original masks. I'm sure I could have done this more elegently. I have been looking into whether or not I should still do on-the-fly augmentation during training to assist with prevention of overfitting.

I should clarify; no more than 3 augmentations per single transform of an image. Three is a rule of thumb that I have seen in numerous articles and software documentation. The basic premise is not to change the original image so much that it no longer represents what you're trying to classify. For example, if you scramble a picture of a dog so much that it looks just like background, you're actually going to hamper your model's ability to distinguish between a dog and background.

aleju / imgaug

How to do rotation argumentation on COCO dataset #200