facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.29k stars 2.5k forks source link

Does the training include augmentation? #296

Open mattans opened 5 years ago

mattans commented 5 years ago

In the code, I stumbled upon functions for flipping, cropping etc., however I did not find reference for augmentation in the docs. Are there some online augmentations when invoking train_net.py ? I would assume that yes as the coco dataset is non-augmented.

fmassa commented 5 years ago

The only data augmentation that we currently do is random horizontal flipping. On a private branch I added support for random crops, but didn't notice much difference on COCO, and we also need to pay attention to crops that do not have any annotations, which makes things a bit more complicated.

But just wrapping around a Crop operator is fairly easy.

mattans commented 5 years ago

Is there a way to avoid the random horizontal flipping?

fmassa commented 5 years ago

Just change https://github.com/facebookresearch/maskrcnn-benchmark/blob/61ffdb3803db562c23d883439aee16e599c051e6/maskrcnn_benchmark/data/transforms/build.py#L9 to be flip_prob = 0

txytju commented 5 years ago

I think maybe random scaling of images can be a good way of augmentation? But when _C.DATALOADER.ASPECT_RATIO_GROUPING = True maybe it's not easy to implement that.

And would you like to share the random crop branch for augmentation?

fmassa commented 5 years ago

Random scaling is fairly easy to implement, and is already implemented in https://github.com/facebookresearch/maskrcnn-benchmark/pull/69/files#diff-994ea1c02fd7e675924e1add66561d27

About random crop, yeah, I can definitely share the code, but for my case it hasn't improved performance. Also, you might need to handle a few cases as I mentioned before, so it might not be easy.

zimenglan-sysu-512 commented 5 years ago

hi @fmassa can u share the code of random crop? i want to try it to deal with the case of larger resolution images and more than 500 annotaions for some images. thanks

fmassa commented 5 years ago

@zimenglan-sysu-512 I don't know where it is anymore (so many branches, in so many clones of the repo :-) ), but it should be fairly straightforward to do, something like

class RandomCrop(object):
    def __init__(self, size):
        self.size = size

    @staticmethod
    def get_params(img, output_size):
        """Get parameters for ``crop`` for a random crop.
        Args:
            img (PIL Image): Image to be cropped.
            output_size (tuple): Expected output size of the crop.
        Returns:
            tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
        """
        w, h = img.size
        th, tw = output_size
        if w == tw and h == th:
            return 0, 0, h, w

        i = random.randint(0, h - th)
        j = random.randint(0, w - tw)
        return i, j, th, tw

    def __call__(self, image, target):
        i, j, h, w = self.get_params(image, self.size)
        image = F.crop(image, i, j ,h, w)
        target = target.crop(i, j, h, w)
        return image, target

NOTE: this is untested

zimenglan-sysu-512 commented 5 years ago

hi @fmassa does it need to deal with the polygon and keypoints correspondingly?

zimenglan-sysu-512 commented 5 years ago

by the way, how to do the crop for keypoints?

fmassa commented 5 years ago

@zimenglan-sysu-512 normally target already contain the polygon and the keypoints, so that should be enough. Except if the crop implementation there is not correct for this use-case

fmassa commented 5 years ago

@zimenglan-sysu-512 for keypoints crop is not implemented, but shouldn't be hard, but play with the shifts and visibility

zimenglan-sysu-512 commented 5 years ago

i will try it. thanks.

zimenglan-sysu-512 commented 5 years ago

hi @fmassa is it also needed to remove invalid bbox?

fmassa commented 5 years ago

I don't think it's needed

maxsenh commented 5 years ago

@fmassa While doing data augmentation, the data set is increasing its size right? During horizontal flip, are then two images returned to train with instead of only one? For my case, data augmentation might be a good option because my data set only consists of ~200 images. Would you rather perform online or offline data augmentation?

madurner commented 5 years ago

@fmassa I am fine-tuning on a pretty challenging dataset (industrial objects, almost no texture etc.). And the results are not really good... So I was wondering if augmentation can help there. Is there a reason why no color augmentations (e.g. ColorJitter) is used? I also wanted to augment the image with little black squares. However this would influence also the masks somehow right? If so, is there an easy way to add the black squares?

johncorring commented 5 years ago

@fmassa @zimenglan-sysu-512 Maybe I misunderstand the reply to:

hi @fmassa is it also needed to remove invalid bbox?

.... But you should definitely remove invalid boxes after random cropping, otherwise you will have partial or even (0,0)-(0,0) boxes lying all over the place (depending on crop size). For loss purposes I am not clear on how empty boxes are being handled in this codebase yet, but you definitely want to remove boxes with significantly reduced area (use a before vs. after overlap condition). Unless I am missing something?

chengyangfu commented 5 years ago

I think these empty boxes or very narrow boxes will cause problems in training, especially for mask prediction. In one of my private datasets, the narrow boxes such as (3, 50), or (40,2) cause the mask alignment part break.

zimenglan-sysu-512 commented 5 years ago

hi @johncorring as what @chengyangfu says, u need to deel with the empty or invalid bbox when training, especially when u use random crop to process the large resolutioin images (where u can drop some bboxes), like DOTA dataset or just augment the data.

YubinXie commented 5 years ago

How about rotation augmentation? Is there a way to add options from pytorch vision for the transformer?

johncorring commented 5 years ago

@YubinXie Rotation augmentation isn't used that often for detection; though in some domains rotations of 90, 180, 270 could be useful. For the general case, you have to deal both with the introduced undefined regions as well as the fact that rotating the object may introduce unwanted loss of precision in the ground truth (think about rotating a long rectangle and then recomputed its axis aligned bbox). For detection I don't think rotating augmentation is worth it unless you are doing quadrangle detection like is frequently done in OCR. For segmentation I suppose it could be useful? I don't know.

YubinXie commented 5 years ago

@johncorring Yea, it is for segmentation. When doing rotation augmentation, the rotation can be done at the input image and label (mask), the bbox can be generated based on the mask instead of rotation the original one. In TF version MaskRCNN (https://github.com/matterport/Mask_RCNN), the augmentation is performed by https://github.com/aleju/imgaug and it works pretty well (and easy). The aim of rotation augmentation is to get over rotated objects in domain images.

chenchr commented 5 years ago

@fmassa Hello, from current code on master branch 24c8c90 the data augmentation includes ColorJitter. Is the model reported in model_zoo.md trained with ColorJitter or just horizontal flipping like the config of Detectron ?

ShihuaiXu commented 5 years ago

The only data augmentation that we currently do is random horizontal flipping. On a private branch I added support for random crops, but didn't notice much difference on COCO, and we also need to pay attention to crops that do not have any annotations, which makes things a bit more complicated.

But just wrapping around a Crop operator is fairly easy.

@fmassa, I just check the code, I found the horizontal flipping is existing, but the number of training set is the same, so this method of argumen dose not add the total number of training set? right?

ShihuaiXu commented 5 years ago

@fmassa While doing data augmentation, the data set is increasing its size right? During horizontal flip, are then two images returned to train with instead of only one? For my case, data augmentation might be a good option because my data set only consists of ~200 images. Would you rather perform online or offline data augmentation?

have you found the answer? I think only one remains