Open mattans opened 5 years ago
The only data augmentation that we currently do is random horizontal flipping. On a private branch I added support for random crops, but didn't notice much difference on COCO, and we also need to pay attention to crops that do not have any annotations, which makes things a bit more complicated.
But just wrapping around a Crop operator is fairly easy.
Is there a way to avoid the random horizontal flipping?
I think maybe random scaling of images can be a good way of augmentation? But when _C.DATALOADER.ASPECT_RATIO_GROUPING = True
maybe it's not easy to implement that.
And would you like to share the random crop branch for augmentation?
Random scaling is fairly easy to implement, and is already implemented in https://github.com/facebookresearch/maskrcnn-benchmark/pull/69/files#diff-994ea1c02fd7e675924e1add66561d27
About random crop, yeah, I can definitely share the code, but for my case it hasn't improved performance. Also, you might need to handle a few cases as I mentioned before, so it might not be easy.
hi @fmassa
can u share the code of random crop
? i want to try it to deal with the case of larger resolution images and more than 500 annotaions for some images.
thanks
@zimenglan-sysu-512 I don't know where it is anymore (so many branches, in so many clones of the repo :-) ), but it should be fairly straightforward to do, something like
class RandomCrop(object):
def __init__(self, size):
self.size = size
@staticmethod
def get_params(img, output_size):
"""Get parameters for ``crop`` for a random crop.
Args:
img (PIL Image): Image to be cropped.
output_size (tuple): Expected output size of the crop.
Returns:
tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
"""
w, h = img.size
th, tw = output_size
if w == tw and h == th:
return 0, 0, h, w
i = random.randint(0, h - th)
j = random.randint(0, w - tw)
return i, j, th, tw
def __call__(self, image, target):
i, j, h, w = self.get_params(image, self.size)
image = F.crop(image, i, j ,h, w)
target = target.crop(i, j, h, w)
return image, target
NOTE: this is untested
hi @fmassa
does it need to deal with the polygon
and keypoints
correspondingly?
by the way, how to do the crop
for keypoints?
@zimenglan-sysu-512 normally target
already contain the polygon
and the keypoints
, so that should be enough. Except if the crop
implementation there is not correct for this use-case
@zimenglan-sysu-512 for keypoints
crop
is not implemented, but shouldn't be hard, but play with the shifts and visibility
i will try it. thanks.
hi @fmassa is it also needed to remove invalid bbox?
I don't think it's needed
@fmassa While doing data augmentation, the data set is increasing its size right? During horizontal flip, are then two images returned to train with instead of only one? For my case, data augmentation might be a good option because my data set only consists of ~200 images. Would you rather perform online or offline data augmentation?
@fmassa I am fine-tuning on a pretty challenging dataset (industrial objects, almost no texture etc.). And the results are not really good... So I was wondering if augmentation can help there. Is there a reason why no color augmentations (e.g. ColorJitter) is used? I also wanted to augment the image with little black squares. However this would influence also the masks somehow right? If so, is there an easy way to add the black squares?
@fmassa @zimenglan-sysu-512 Maybe I misunderstand the reply to:
hi @fmassa is it also needed to remove invalid bbox?
.... But you should definitely remove invalid boxes after random cropping, otherwise you will have partial or even (0,0)-(0,0) boxes lying all over the place (depending on crop size). For loss purposes I am not clear on how empty boxes are being handled in this codebase yet, but you definitely want to remove boxes with significantly reduced area (use a before vs. after overlap condition). Unless I am missing something?
I think these empty boxes or very narrow boxes will cause problems in training, especially for mask prediction. In one of my private datasets, the narrow boxes such as (3, 50), or (40,2) cause the mask alignment part break.
hi @johncorring
as what @chengyangfu says, u need to deel with the empty or invalid bbox when training, especially when u use random crop
to process the large resolutioin images (where u can drop some bboxes), like DOTA
dataset or just augment the data.
How about rotation augmentation? Is there a way to add options from pytorch vision for the transformer?
@YubinXie Rotation augmentation isn't used that often for detection; though in some domains rotations of 90, 180, 270 could be useful. For the general case, you have to deal both with the introduced undefined regions as well as the fact that rotating the object may introduce unwanted loss of precision in the ground truth (think about rotating a long rectangle and then recomputed its axis aligned bbox). For detection I don't think rotating augmentation is worth it unless you are doing quadrangle detection like is frequently done in OCR. For segmentation I suppose it could be useful? I don't know.
@johncorring Yea, it is for segmentation. When doing rotation augmentation, the rotation can be done at the input image and label (mask), the bbox can be generated based on the mask instead of rotation the original one. In TF version MaskRCNN (https://github.com/matterport/Mask_RCNN), the augmentation is performed by https://github.com/aleju/imgaug and it works pretty well (and easy). The aim of rotation augmentation is to get over rotated objects in domain images.
@fmassa Hello, from current code on master branch 24c8c90
the data augmentation includes ColorJitter
.
Is the model reported in model_zoo.md
trained with ColorJitter
or just horizontal flipping
like the config of Detectron ?
The only data augmentation that we currently do is random horizontal flipping. On a private branch I added support for random crops, but didn't notice much difference on COCO, and we also need to pay attention to crops that do not have any annotations, which makes things a bit more complicated.
But just wrapping around a Crop operator is fairly easy.
@fmassa, I just check the code, I found the horizontal flipping is existing, but the number of training set is the same, so this method of argumen dose not add the total number of training set? right?
@fmassa While doing data augmentation, the data set is increasing its size right? During horizontal flip, are then two images returned to train with instead of only one? For my case, data augmentation might be a good option because my data set only consists of ~200 images. Would you rather perform online or offline data augmentation?
have you found the answer? I think only one remains
In the code, I stumbled upon functions for flipping, cropping etc., however I did not find reference for augmentation in the docs. Are there some online augmentations when invoking train_net.py ? I would assume that yes as the coco dataset is non-augmented.