数据增强问题 - Githubissues

GuoQuanhao commented 1 year ago

问题确认 Search before asking

[X] 我已经查询历史issue，没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

PaddleDetection中RandomDisort、Mosaic以及Cutout存在问题 RandomDisort中的亮度调节如下 000000000120 PaddleDetection亮度调节代码如下：

def apply_brightness(self, img):
    low, high, prob = self.brightness
    if np.random.uniform(0., 1.) < prob:
        return img
    delta = np.random.uniform(low, high)
    img = img.astype(np.float32)
    img += delta
    return img

Paddle.vision提供了对应的亮度调节，使用Paddle.vision表现正常

Mosaic数据增强如下第一幅图右下角存在多个标签第二幅图左边标签未做截断 000000000885 000000000880

此外CutOut数据增强只处理了图像，未对标签进行处理，PaddleDetection的Mosaic参考至YOLOX，我对YOLOX的Masaic进行改造，插入进PaddleDetection，表现正常，如下所示

@register_op
class Mosaic(BaseOperator):
    """ Mosaic operator for image and gt_bboxes
    The code is based on https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/data/datasets/mosaicdetection.py

    1. get mosaic coords
    2. clip bbox and get mosaic_labels
    3. random_affine augment
    4. Mixup augment as copypaste (optinal), not used in tiny/nano

    Args:
        prob (float): probability of using Mosaic, 1.0 as default
        input_dim (list[int]): input shape
        degrees (list[2]): the rotate range to apply, transform range is [min, max]
        translate (list[2]): the translate range to apply, transform range is [min, max]
        scale (list[2]): the scale range to apply, transform range is [min, max]
        shear (list[2]): the shear range to apply, transform range is [min, max]
        enable_mixup (bool): whether to enable Mixup or not
        mixup_prob (float): probability of using Mixup, 1.0 as default
        mixup_scale (list[int]): scale range of Mixup
    """

    # prob: 1.0
    # input_dim: [864, 608]
    # degrees: [0, 0]
    # scale: [0.5, 2.0]
    # shear: [-0.5, 0.5]
    # translate: [-0.1, 0.1]
    # enable_mixup: True

    def __init__(self,
                 prob=1.0,
                 input_dim=[640, 640],
                 degrees=[-10, 10],
                 translate=[-0.1, 0.1],
                 scale=[0.5, 1.5],
                 shear=[-2, 2],
                 enable_mixup=True,
                 mixup_prob=1.0,
                 mixup_scale=[0.5, 1.5]):
        super(Mosaic, self).__init__()
        self.prob = prob
        if isinstance(input_dim, Integral):
            input_dim = [input_dim, input_dim]
        self.input_dim = input_dim
        self.degrees = random.uniform(degrees[0], degrees[1])
        self.translate = random.uniform(translate[0], translate[1])
        self.scale = scale
        self.shear = random.uniform(shear[0], shear[1])
        self.enable_mixup = enable_mixup
        self.mixup_prob = mixup_prob
        self.mixup_scale = mixup_scale

    def get_mosaic_coordinate(self, mosaic_image, mosaic_index, xc, yc, w, h, input_h, input_w):
        # TODO update doc
        # index0 to top left part of image
        if mosaic_index == 0:
            x1, y1, x2, y2 = max(xc - w, 0), max(yc - h, 0), xc, yc
            small_coord = w - (x2 - x1), h - (y2 - y1), w, h
        # index1 to top right part of image
        elif mosaic_index == 1:
            x1, y1, x2, y2 = xc, max(yc - h, 0), min(xc + w, input_w * 2), yc
            small_coord = 0, h - (y2 - y1), min(w, x2 - x1), h
        # index2 to bottom left part of image
        elif mosaic_index == 2:
            x1, y1, x2, y2 = max(xc - w, 0), yc, xc, min(input_h * 2, yc + h)
            small_coord = w - (x2 - x1), 0, w, min(y2 - y1, h)
        # index2 to bottom right part of image
        elif mosaic_index == 3:
            x1, y1, x2, y2 = xc, yc, min(xc + w, input_w * 2), min(input_h * 2, yc + h)  # noqa
            small_coord = 0, 0, min(w, x2 - x1), min(y2 - y1, h)
        return (x1, y1, x2, y2), small_coord

    def get_aug_params(self, value, center=0):
        if isinstance(value, float):
            # -0.7956896108950091 2.795689610895009 1.7956896108950091 ---------
            return random.uniform(center - value, center + value)
        elif len(value) == 2:
            return random.uniform(value[0], value[1])
        else:
            raise ValueError(
                "Affine params should be either a sequence containing two values\
                or single float values. Got {}".format(value)
            )

    def get_affine_matrix(
        self,
        target_size,
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        twidth, theight = target_size

        # Rotation and Scale
        angle = self.get_aug_params(degrees)
        scale = self.get_aug_params(scales, center=1.0)

        if scale <= 0.0:
            raise ValueError("Argument scale should be positive")

        R = cv2.getRotationMatrix2D(angle=angle, center=(0, 0), scale=scale)

        M = np.ones([2, 3])
        # Shear
        shear_x = math.tan(self.get_aug_params(shear) * math.pi / 180)
        shear_y = math.tan(self.get_aug_params(shear) * math.pi / 180)

        M[0] = R[0] + shear_y * R[1]
        M[1] = R[1] + shear_x * R[0]

        # Translation
        translation_x = self.get_aug_params(translate) * twidth  # x translation (pixels)
        translation_y = self.get_aug_params(translate) * theight  # y translation (pixels)

        M[0, 2] = translation_x
        M[1, 2] = translation_y

        return M, scale

    def apply_affine_to_bboxes(self, targets, target_size, M, scale):
        num_gts = len(targets)

        # warp corner points
        twidth, theight = target_size
        corner_points = np.ones((4 * num_gts, 3))
        corner_points[:, :2] = targets[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
            4 * num_gts, 2
        )  # x1y1, x2y2, x1y2, x2y1
        corner_points = corner_points @ M.T  # apply affine transform
        corner_points = corner_points.reshape(num_gts, 8)

        # create new boxes
        corner_xs = corner_points[:, 0::2]
        corner_ys = corner_points[:, 1::2]
        new_bboxes = (
            np.concatenate(
                (corner_xs.min(1), corner_ys.min(1), corner_xs.max(1), corner_ys.max(1))
            )
            .reshape(4, num_gts)
            .T
        )

        # clip boxes
        new_bboxes[:, 0::2] = new_bboxes[:, 0::2].clip(0, twidth)
        new_bboxes[:, 1::2] = new_bboxes[:, 1::2].clip(0, theight)

        targets[:, :4] = new_bboxes

        return targets

    def random_affinerandom_affine(
        self,
        img,
        targets=(),
        target_size=(640, 640),
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        M, scale = self.get_affine_matrix(target_size, degrees, translate, scales, shear)

        img = cv2.warpAffine(img, M, dsize=target_size, borderValue=(114, 114, 114))

        # Transform label coordinates
        if len(targets) > 0:
            targets = self.apply_affine_to_bboxes(targets, target_size, M, scale)

        return img, targets

    def random_affine(
        self,
        img,
        targets=(),
        target_size=(640, 640),
        degrees=10,
        translate=0.1,
        scales=0.1,
        shear=10,
    ):
        M, scale = self.get_affine_matrix(target_size, degrees, translate, scales, shear)

        img = cv2.warpAffine(img, M, dsize=target_size, borderValue=(114, 114, 114))

        # Transform label coordinates
        if len(targets) > 0:
            targets = self.apply_affine_to_bboxes(targets, target_size, M, scale)

        return img, targets

    def adjust_box_anns(self, bbox, scale_ratio, padw, padh, w_max, h_max):
        bbox[:, 0::2] = np.clip(bbox[:, 0::2] * scale_ratio + padw, 0, w_max)
        bbox[:, 1::2] = np.clip(bbox[:, 1::2] * scale_ratio + padh, 0, h_max)
        return bbox

    def mixup(self, origin_img, origin_labels, input_dim, cp_index):
        jit_factor = random.uniform(*self.mixup_scale)
        FLIP = random.uniform(0, 1) > 0.5
        img, gt_bbox, gt_class, is_crowd = cp_index['image'], cp_index['gt_bbox'], cp_index['gt_class'], cp_index['is_crowd']
        cp_labels = np.concatenate([gt_bbox, gt_class, is_crowd], 1)

        if len(img.shape) == 3:
            cp_img = np.ones((input_dim[0], input_dim[1], 3), dtype=np.uint8) * 114
        else:
            cp_img = np.ones(input_dim, dtype=np.uint8) * 114

        cp_scale_ratio = min(input_dim[0] / img.shape[0], input_dim[1] / img.shape[1])
        resized_img = cv2.resize(
            img,
            (int(img.shape[1] * cp_scale_ratio), int(img.shape[0] * cp_scale_ratio)),
            interpolation=cv2.INTER_LINEAR,
        )

        cp_img[
            : int(img.shape[0] * cp_scale_ratio), : int(img.shape[1] * cp_scale_ratio)
        ] = resized_img

        cp_img = cv2.resize(
            cp_img,
            (int(cp_img.shape[1] * jit_factor), int(cp_img.shape[0] * jit_factor)),
        )
        cp_scale_ratio *= jit_factor

        if FLIP:
            cp_img = cp_img[:, ::-1, :]

        origin_h, origin_w = cp_img.shape[:2]
        target_h, target_w = origin_img.shape[:2]
        padded_img = np.zeros(
            (max(origin_h, target_h), max(origin_w, target_w), 3), dtype=np.uint8
        )
        padded_img[:origin_h, :origin_w] = cp_img

        x_offset, y_offset = 0, 0
        if padded_img.shape[0] > target_h:
            y_offset = random.randint(0, padded_img.shape[0] - target_h - 1)
        if padded_img.shape[1] > target_w:
            x_offset = random.randint(0, padded_img.shape[1] - target_w - 1)
        padded_cropped_img = padded_img[
            y_offset: y_offset + target_h, x_offset: x_offset + target_w
        ]

        cp_bboxes_origin_np = self.adjust_box_anns(
            cp_labels[:, :4].copy(), cp_scale_ratio, 0, 0, origin_w, origin_h
        )
        if FLIP:
            cp_bboxes_origin_np[:, 0::2] = (
                origin_w - cp_bboxes_origin_np[:, 0::2][:, ::-1]
            )
        cp_bboxes_transformed_np = cp_bboxes_origin_np.copy()
        cp_bboxes_transformed_np[:, 0::2] = np.clip(
            cp_bboxes_transformed_np[:, 0::2] - x_offset, 0, target_w
        )
        cp_bboxes_transformed_np[:, 1::2] = np.clip(
            cp_bboxes_transformed_np[:, 1::2] - y_offset, 0, target_h
        )

        cls_labels = cp_labels[:, 4:6].copy()
        box_labels = cp_bboxes_transformed_np
        labels = np.hstack((box_labels, cls_labels))
        origin_labels = np.vstack((origin_labels, labels))
        origin_img = origin_img.astype(np.float32)
        origin_img = 0.5 * origin_img + 0.5 * padded_cropped_img.astype(np.float32)

        return origin_img.astype(np.uint8), origin_labels

    def __call__(self, sample, context=None):
        if not isinstance(sample, Sequence):
            return sample
        assert len(
            sample) == 5, "Mosaic needs 5 samples, 4 for mosaic and 1 for mixup."
        if random.random() < self.prob:
            input_h, input_w = self.input_dim[0], self.input_dim[1]
            yc = int(random.uniform(0.5 * input_h, 1.5 * input_h))
            xc = int(random.uniform(0.5 * input_w, 1.5 * input_w))
            mosaic_labels = []
            # dict_keys(['im_id', 'h', 'w', 'is_crowd', 'gt_class', 'gt_bbox', 'curr_iter', 'image', 'im_shape', 'scale_factor'])
            for i_mosaic, index in enumerate(sample[:4]):
                img, gt_bbox, gt_class, is_crowd = index['image'], index['gt_bbox'], index['gt_class'], index['is_crowd']
                _labels = np.concatenate([gt_bbox, gt_class, is_crowd], 1)
                h0, w0 = index['h'], index['w']
                scale = min(1. * input_h / h0, 1. * input_w / w0)
                img = cv2.resize(
                        img, (int(w0 * scale), int(h0 * scale)), interpolation=cv2.INTER_LINEAR
                )
                (h, w, c) = img.shape[:3]
                # generate output mosaic image
                if i_mosaic == 0:
                    mosaic_img = np.full((input_h * 2, input_w * 2, c), 114, dtype=np.uint8)
                # suffix l means large image, while s means small image in mosaic aug.
                (l_x1, l_y1, l_x2, l_y2), (s_x1, s_y1, s_x2, s_y2) = self.get_mosaic_coordinate(
                    mosaic_img, i_mosaic, xc, yc, w, h, input_h, input_w
                )
                mosaic_img[l_y1:l_y2, l_x1:l_x2] = img[s_y1:s_y2, s_x1:s_x2]
                padw, padh = l_x1 - s_x1, l_y1 - s_y1
                labels = _labels.copy()
                # Normalized xywh to pixel xyxy format
                if len(_labels) > 0:
                    labels[:, 0] = scale * _labels[:, 0] + padw
                    labels[:, 1] = scale * _labels[:, 1] + padh
                    labels[:, 2] = scale * _labels[:, 2] + padw
                    labels[:, 3] = scale * _labels[:, 3] + padh
                mosaic_labels.append(labels)

            if len(mosaic_labels):
                mosaic_labels = np.concatenate(mosaic_labels, 0)
                np.clip(mosaic_labels[:, 0], 0, 2 * input_w, out=mosaic_labels[:, 0])
                np.clip(mosaic_labels[:, 1], 0, 2 * input_h, out=mosaic_labels[:, 1])
                np.clip(mosaic_labels[:, 2], 0, 2 * input_w, out=mosaic_labels[:, 2])
                np.clip(mosaic_labels[:, 3], 0, 2 * input_h, out=mosaic_labels[:, 3])

            mosaic_img, mosaic_labels = self.random_affine(
                mosaic_img,
                mosaic_labels,
                target_size=(input_w, input_h),
                degrees=self.degrees,
                translate=self.translate,
                scales=self.scale,
                shear=self.shear,
            )

            # -----------------------------------------------------------------
            # CopyPaste: https://arxiv.org/abs/2012.07177
            # -----------------------------------------------------------------
            if (
                self.enable_mixup
                and not len(mosaic_labels) == 0
                and random.random() < self.mixup_prob
            ):
                mosaic_img, mosaic_labels = self.mixup(mosaic_img, mosaic_labels, self.input_dim, sample[-1])

            # -----------------------------------------------------------------
            # img_info and img_id are not used for training.
            # They are also hard to be specified on a mosaic image.
            # -----------------------------------------------------------------
            valid_mosaic_labels = []
            for item in mosaic_labels:
                x0, y0, x1, y1 = item[:4]
                if (x1 - x0) * (y1 - y0) > 0:
                    valid_mosaic_labels.append(item)
            valid_mosaic_labels = np.array(valid_mosaic_labels)
            if len(valid_mosaic_labels) > 0:
                sample0 = sample[0]
                sample0['image'] = mosaic_img.astype(np.uint8)  # can not be float32
                sample0['h'] = float(mosaic_img.shape[0])
                sample0['w'] = float(mosaic_img.shape[1])
                sample0['im_shape'][0] = sample0['h']
                sample0['im_shape'][1] = sample0['w']
                sample0['gt_bbox'] = valid_mosaic_labels[:,:4]
                sample0['gt_class'] = valid_mosaic_labels[:, 4:5].astype(np.float32)
                sample0['is_crowd'] = valid_mosaic_labels[:, 5:6].astype(np.float32).reshape([-1, 1])
                return sample0
            else:
                return sample[0]
        else:
            return sample[0]

复现环境 Environment

OS Linux
PaddleDetection=2.6

Bug描述确认 Bug description confirmation

[X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息，确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR？ Are you willing to submit a PR?

[X] 我愿意提交PR！I'd like to help by submitting a PR!

GuoQuanhao commented 1 year ago

通过在yml文件中设置- DebugVisibleImage: {}可视化数据增强

GuoQuanhao commented 1 year ago

经过验证，以下数据增强均存在问题('TranslateX_BBox', 1.0, 8) ('TranslateY_BBox', 1.0, 8) ('BBox_Cutout', 1.0, 30) ('Rotate_BBox', 1.0, 30) ('Cutout', 1.0, 30) ('ShearY_BBox', 1.0, 30)，跟bbox有关的数据增强没有考虑图像变化后，框截断，裁剪掉的问题，并且，在数据增强中，应该带上gt_class和is_crowd同步变换，在出现框裁剪，消失等问题时，能够同步gt_class，is_crowd

GuoQuanhao commented 1 year ago

在

@register_op
class AutoAugment(BaseOperator):

这个类当中，也没有对gt_class的处理，整个paddledetection涉及框变换的数据增强，均有问题

GuoQuanhao commented 1 year ago

通过参考https://github.com/ZhenglinZhou/Data_Augmentation_Zoo_for_Object_Detection/tree/master/augmentation_zoo

和

https://www.kaggle.com/code/kaushal2896/data-augmentation-tutorial-basic-cutout-mixup

我已修改完成，有需求可以提交pr😄

nissansz commented 1 year ago

paddle 训练版面分析时，PubLayNet paddle标签要转换吗？还是直接下载了就能用？

GuoQuanhao commented 1 year ago

paddle 训练版面分析时，PubLayNet paddle标签要转换吗？还是直接下载了就能用？

没用过这个数据集，用的CDLA

nissansz commented 1 year ago

CDLA　的标签是什么格式？可以发一下看看？

PaddlePaddle / PaddleDetection

数据增强问题 #8517

问题确认 Search before asking

Bug组件 Bug Component

Bug描述 Describe the Bug

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR？ Are you willing to submit a PR?