PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.84k stars 2.89k forks source link

数据增强问题 #8517

Closed GuoQuanhao closed 8 months ago

GuoQuanhao commented 1 year ago

问题确认 Search before asking

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

PaddleDetection中RandomDisort、Mosaic以及Cutout存在问题 RandomDisort中的亮度调节如下 000000000120 PaddleDetection亮度调节代码如下:

def apply_brightness(self, img):
    low, high, prob = self.brightness
    if np.random.uniform(0., 1.) < prob:
        return img
    delta = np.random.uniform(low, high)
    img = img.astype(np.float32)
    img += delta
    return img

Paddle.vision提供了对应的亮度调节,使用Paddle.vision表现正常

Mosaic数据增强如下 第一幅图右下角存在多个标签 第二幅图左边标签未做截断 000000000885 000000000880

此外CutOut数据增强只处理了图像,未对标签进行处理,PaddleDetection的Mosaic参考至YOLOX,我对YOLOX的Masaic进行改造,插入进PaddleDetection,表现正常,如下所示

@register_op
class Mosaic(BaseOperator):
    """ Mosaic operator for image and gt_bboxes
    The code is based on https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/data/datasets/mosaicdetection.py

    1. get mosaic coords
    2. clip bbox and get mosaic_labels
    3. random_affine augment
    4. Mixup augment as copypaste (optinal), not used in tiny/nano

    Args:
        prob (float): probability of using Mosaic, 1.0 as default
        input_dim (list[int]): input shape
        degrees (list[2]): the rotate range to apply, transform range is [min, max]
        translate (list[2]): the translate range to apply, transform range is [min, max]
        scale (list[2]): the scale range to apply, transform range is [min, max]
        shear (list[2]): the shear range to apply, transform range is [min, max]
        enable_mixup (bool): whether to enable Mixup or not
        mixup_prob (float): probability of using Mixup, 1.0 as default
        mixup_scale (list[int]): scale range of Mixup
    """

    # prob: 1.0
    # input_dim: [864, 608]
    # degrees: [0, 0]
    # scale: [0.5, 2.0]
    # shear: [-0.5, 0.5]
    # translate: [-0.1, 0.1]
    # enable_mixup: True

    def __init__(self,
                 prob=1.0,
                 input_dim=[640, 640],
                 degrees=[-10, 10],
                 translate=[-0.1, 0.1],
                 scale=[0.5, 1.5],
                 shear=[-2, 2],
                 enable_mixup=True,
                 mixup_prob=1.0,
                 mixup_scale=[0.5, 1.5]):
        super(Mosaic, self).__init__()
        self.prob = prob
        if isinstance(input_dim, Integral):
            input_dim = [input_dim, input_dim]
        self.input_dim = input_dim
        self.degrees = random.uniform(degrees[0], degrees[1])
        self.translate = random.uniform(translate[0], translate[1])
        self.scale = scale
        self.shear = random.uniform(shear[0], shear[1])
        self.enable_mixup = enable_mixup
        self.mixup_prob = mixup_prob
        self.mixup_scale = mixup_scale

    def get_mosaic_coordinate(self, mosaic_image, mosaic_index, xc, yc, w, h, input_h, input_w):
        # TODO update doc
        # index0 to top left part of image
        if mosaic_index == 0:
            x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc
            small_coord = w - (x2 - x1), h - (y2 - y1), w, h
        # index1 to top right part of image
        elif mosaic_index == 1:
            x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc
            small_coord = 0, h - (y2 - y1), min(w, x2 - x1), h
        # index2 to bottom left part of image
        elif mosaic_index == 2:
            x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h)
            small_coord = w - (x2 - x1), 0, w, min(y2 - y1, h)
        # index2 to bottom right part of image
        elif mosaic_index == 3:
            x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, yc + h)  # noqa
            small_coord = 0, 0, min(w, x2 - x1), min(y2 - y1, h)
        return (x1, y1, x2, y2), small_coord

    def get_aug_params(self, value, center=0):
        if isinstance(value, float):
            # -0.7956896108950091 2.795689610895009 1.7956896108950091 ---------
            return random.uniform(center - value, center + value)
        elif len(value) == 2:
            return random.uniform(value[0], value[1])
        else:
            raise ValueError(
                "Affine params should be either a sequence containing two values\
                or single float values. Got {}".format(value)
            )

    def get_affine_matrix(
        self,
        target_size,
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        twidth, theight = target_size

        # Rotation and Scale
        angle = self.get_aug_params(degrees)
        scale = self.get_aug_params(scales, center=1.0)

        if scale <= 0.0:
            raise ValueError("Argument scale should be positive")

        R = cv2.getRotationMatrix2D(angle=angle, center=(0, 0), scale=scale)

        M = np.ones([2, 3])
        # Shear
        shear_x = math.tan(self.get_aug_params(shear) * math.pi / 180)
        shear_y = math.tan(self.get_aug_params(shear) * math.pi / 180)

        M[0] = R[0] + shear_y * R[1]
        M[1] = R[1] + shear_x * R[0]

        # Translation
        translation_x = self.get_aug_params(translate) * twidth  # x translation (pixels)
        translation_y = self.get_aug_params(translate) * theight  # y translation (pixels)

        M[0, 2] = translation_x
        M[1, 2] = translation_y

        return M, scale

    def apply_affine_to_bboxes(self, targets, target_size, M, scale):
        num_gts = len(targets)

        # warp corner points
        twidth, theight = target_size
        corner_points = np.ones((4 * num_gts, 3))
        corner_points[:, :2] = targets[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
            4 * num_gts, 2
        )  # x1y1, x2y2, x1y2, x2y1
        corner_points = corner_points @ M.T  # apply affine transform
        corner_points = corner_points.reshape(num_gts, 8)

        # create new boxes
        corner_xs = corner_points[:, 0::2]
        corner_ys = corner_points[:, 1::2]
        new_bboxes = (
            np.concatenate(
                (corner_xs.min(1), corner_ys.min(1), corner_xs.max(1), corner_ys.max(1))
            )
            .reshape(4, num_gts)
            .T
        )

        # clip boxes
        new_bboxes[:, 0::2] = new_bboxes[:, 0::2].clip(0, twidth)
        new_bboxes[:, 1::2] = new_bboxes[:, 1::2].clip(0, theight)

        targets[:, :4] = new_bboxes

        return targets

    def random_affinerandom_affine(
        self,
        img,
        targets=(),
        target_size=(640, 640),
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        M, scale = self.get_affine_matrix(target_size, degrees, translate, scales, shear)

        img = cv2.warpAffine(img, M, dsize=target_size, borderValue=(114, 114, 114))

        # Transform label coordinates
        if len(targets) > 0:
            targets = self.apply_affine_to_bboxes(targets, target_size, M, scale)

        return img, targets

    def random_affine(
        self,
        img,
        targets=(),
        target_size=(640, 640),
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        M, scale = self.get_affine_matrix(target_size, degrees, translate, scales, shear)

        img = cv2.warpAffine(img, M, dsize=target_size, borderValue=(114, 114, 114))

        # Transform label coordinates
        if len(targets) > 0:
            targets = self.apply_affine_to_bboxes(targets, target_size, M, scale)

        return img, targets

    def adjust_box_anns(self, bbox, scale_ratio, padw, padh, w_max, h_max):
        bbox[:, 0::2] = np.clip(bbox[:, 0::2] * scale_ratio + padw, 0, w_max)
        bbox[:, 1::2] = np.clip(bbox[:, 1::2] * scale_ratio + padh, 0, h_max)
        return bbox

    def mixup(self, origin_img, origin_labels, input_dim, cp_index):
        jit_factor = random.uniform(*self.mixup_scale)
        FLIP = random.uniform(0, 1) > 0.5
        img, gt_bbox, gt_class, is_crowd = cp_index['image'], cp_index['gt_bbox'], cp_index['gt_class'], cp_index['is_crowd']
        cp_labels = np.concatenate([gt_bbox, gt_class, is_crowd], 1)

        if len(img.shape) == 3:
            cp_img = np.ones((input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114
        else:
            cp_img = np.ones(input_dim, dtype=np.uint8) * 114

        cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1])
        resized_img = cv2.resize(
            img,
            (int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)),
            interpolation=cv2.INTER_LINEAR,
        )

        cp_img[
            : int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)
        ] = resized_img

        cp_img = cv2.resize(
            cp_img,
            (int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)),
        )
        cp_scale_ratio *= jit_factor

        if FLIP:
            cp_img = cp_img[:, ::-1, :]

        origin_h, origin_w = cp_img.shape[:2]
        target_h, target_w = origin_img.shape[:2]
        padded_img = np.zeros(
            (max(origin_h, target_h), max(origin_w, target_w), 3), dtype=np.uint8
        )
        padded_img[:origin_h, :origin_w] = cp_img

        x_offset, y_offset = 0, 0
        if padded_img.shape[0] > target_h:
            y_offset = random.randint(0, padded_img.shape[0] - target_h - 1)
        if padded_img.shape[1] > target_w:
            x_offset = random.randint(0, padded_img.shape[1] - target_w - 1)
        padded_cropped_img = padded_img[
            y_offset: y_offset + target_h, x_offset: x_offset + target_w
        ]

        cp_bboxes_origin_np = self.adjust_box_anns(
            cp_labels[:, :4].copy(), cp_scale_ratio, 0, 0, origin_w, origin_h
        )
        if FLIP:
            cp_bboxes_origin_np[:, 0::2] = (
                origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]
            )
        cp_bboxes_transformed_np = cp_bboxes_origin_np.copy()
        cp_bboxes_transformed_np[:, 0::2] = np.clip(
            cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w
        )
        cp_bboxes_transformed_np[:, 1::2] = np.clip(
            cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h
        )

        cls_labels = cp_labels[:, 4:6].copy()
        box_labels = cp_bboxes_transformed_np
        labels = np.hstack((box_labels, cls_labels))
        origin_labels = np.vstack((origin_labels, labels))
        origin_img = origin_img.astype(np.float32)
        origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32)

        return origin_img.astype(np.uint8), origin_labels

    def __call__(self, sample, context=None):
        if not isinstance(sample, Sequence):
            return sample
        assert len(
            sample) == 5, "Mosaic needs 5 samples, 4 for mosaic and 1 for mixup."
        if random.random() < self.prob:
            input_h, input_w = self.input_dim[0], self.input_dim[1]
            yc = int(random.uniform(0.5 * input_h, 1.5 * input_h))
            xc = int(random.uniform(0.5 * input_w, 1.5 * input_w))
            mosaic_labels = []
            # dict_keys(['im_id', 'h', 'w', 'is_crowd', 'gt_class', 'gt_bbox', 'curr_iter', 'image', 'im_shape', 'scale_factor'])
            for i_mosaic, index in enumerate(sample[:4]):
                img, gt_bbox, gt_class, is_crowd = index['image'], index['gt_bbox'], index['gt_class'], index['is_crowd']
                _labels = np.concatenate([gt_bbox, gt_class, is_crowd], 1)
                h0, w0 = index['h'], index['w']
                scale = min(1. * input_h / h0, 1. * input_w / w0)
                img = cv2.resize(
                        img, (int(w0 * scale), int(h0 * scale)), interpolation=cv2.INTER_LINEAR
                )
                (h, w, c) = img.shape[:3]
                # generate output mosaic image
                if i_mosaic == 0:
                    mosaic_img = np.full((input_h * 2, input_w * 2, c), 114, dtype=np.uint8)
                # suffix l means large image, while s means small image in mosaic aug.
                (l_x1, l_y1, l_x2, l_y2), (s_x1, s_y1, s_x2, s_y2) = self.get_mosaic_coordinate(
                    mosaic_img, i_mosaic, xc, yc, w, h, input_h, input_w
                )
                mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2]
                padw, padh = l_x1 - s_x1, l_y1 - s_y1
                labels = _labels.copy()
                # Normalized xywh to pixel xyxy format
                if len(_labels) > 0:
                    labels[:, 0] = scale * _labels[:, 0] + padw
                    labels[:, 1] = scale * _labels[:, 1] + padh
                    labels[:, 2] = scale * _labels[:, 2] + padw
                    labels[:, 3] = scale * _labels[:, 3] + padh
                mosaic_labels.append(labels)

            if len(mosaic_labels):
                mosaic_labels = np.concatenate(mosaic_labels, 0)
                np.clip(mosaic_labels[:, 0], 0, 2 * input_w, out=mosaic_labels[:, 0])
                np.clip(mosaic_labels[:, 1], 0, 2 * input_h, out=mosaic_labels[:, 1])
                np.clip(mosaic_labels[:, 2], 0, 2 * input_w, out=mosaic_labels[:, 2])
                np.clip(mosaic_labels[:, 3], 0, 2 * input_h, out=mosaic_labels[:, 3])

            mosaic_img, mosaic_labels = self.random_affine(
                mosaic_img,
                mosaic_labels,
                target_size=(input_w, input_h),
                degrees=self.degrees,
                translate=self.translate,
                scales=self.scale,
                shear=self.shear,
            )

            # -----------------------------------------------------------------
            # CopyPaste: https://arxiv.org/abs/2012.07177
            # -----------------------------------------------------------------
            if (
                self.enable_mixup
                and not len(mosaic_labels) == 0
                and random.random() < self.mixup_prob
            ):
                mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim, sample[-1])

            # -----------------------------------------------------------------
            # img_info and img_id are not used for training.
            # They are also hard to be specified on a mosaic image.
            # -----------------------------------------------------------------
            valid_mosaic_labels = []
            for item in mosaic_labels:
                x0, y0, x1, y1 = item[:4]
                if (x1 - x0) * (y1 - y0) > 0:
                    valid_mosaic_labels.append(item)
            valid_mosaic_labels = np.array(valid_mosaic_labels)
            if len(valid_mosaic_labels) > 0:
                sample0 = sample[0]
                sample0['image'] = mosaic_img.astype(np.uint8)  # can not be float32
                sample0['h'] = float(mosaic_img.shape[0])
                sample0['w'] = float(mosaic_img.shape[1])
                sample0['im_shape'][0] = sample0['h']
                sample0['im_shape'][1] = sample0['w']
                sample0['gt_bbox'] = valid_mosaic_labels[:,:4]
                sample0['gt_class'] = valid_mosaic_labels[:, 4:5].astype(np.float32)
                sample0['is_crowd'] = valid_mosaic_labels[:, 5:6].astype(np.float32).reshape([-1, 1])
                return sample0
            else:
                return sample[0]
        else:
            return sample[0]

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

GuoQuanhao commented 1 year ago

通过在yml文件中设置- DebugVisibleImage: {}可视化数据增强

GuoQuanhao commented 1 year ago

经过验证,以下数据增强均存在问题('TranslateX_BBox', 1.0, 8) ('TranslateY_BBox', 1.0, 8) ('BBox_Cutout', 1.0, 30) ('Rotate_BBox', 1.0, 30) ('Cutout', 1.0, 30) ('ShearY_BBox', 1.0, 30),跟bbox有关的数据增强没有考虑图像变化后,框截断,裁剪掉的问题,并且,在数据增强中,应该带上gt_class和is_crowd同步变换,在出现框裁剪,消失等问题时,能够同步gt_class,is_crowd

GuoQuanhao commented 1 year ago

@register_op
class AutoAugment(BaseOperator):

这个类当中,也没有对gt_class的处理,整个paddledetection涉及框变换的数据增强,均有问题

GuoQuanhao commented 1 year ago

通过参考https://github.com/ZhenglinZhou/Data_Augmentation_Zoo_for_Object_Detection/tree/master/augmentation_zoo

https://www.kaggle.com/code/kaushal2896/data-augmentation-tutorial-basic-cutout-mixup

我已修改完成,有需求可以提交pr😄

nissansz commented 1 year ago

paddle 训练版面分析时,PubLayNet paddle标签 要转换吗?还是直接下载了就能用?

GuoQuanhao commented 1 year ago

paddle 训练版面分析时,PubLayNet paddle标签 要转换吗?还是直接下载了就能用?

没用过这个数据集,用的CDLA

nissansz commented 1 year ago

CDLA 的标签是什么格式?可以发一下看看?