albumentations-team / albumentations

Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
https://albumentations.ai
MIT License
14.33k stars 1.65k forks source link

Add a transformation that pads an image along the shorter dimension to make it a square #1247

Open issamemari opened 2 years ago

issamemari commented 2 years ago

Hello!

Often in my computer vision projects I need to implement something that makes an image square without changing the aspect ratio. To do that, I simply pad the image along the shorter spatial dimension such that the image content is centered.

As far as I know there isn't such a transform in Albumentations. PadIfNeeded is the closest transform to what I need, but it requires the desired output dimensions as parameters, which doesn't quite work for me because the desired dimensions depend on the actual dimensions of the input image.

I have implemented a custom Albumentations transform that does this, and I'm using it in my projects. I'd be happy to contribute a PR to add it.

ternaus commented 2 years ago

Feel free to contribute it.

Something like PadToSquare

zzhanghub commented 10 months ago

What updates have been recently? I also really want the PadToSquare feature, I think it is a common image processing method.

zzhanghub commented 10 months ago

I've implemented a temporary one, not sure if there will be any potential risks. @ternaus

class PadToSquare(A.DualTransform):
    """Pad image to a square shape (max(height, width) x max(height, width))"""

    def __init__(self, always_apply=False, p=1.0, value=[128, 128, 128, 128]):
        super().__init__(always_apply, p)
        self.value = value

    def apply(self, img, **params):
        height, width = params['rows'], params['cols']
        max_dim = max(height, width)

        resize = A.LongestMaxSize(max_size=max_dim, always_apply=True)
        pad = A.PadIfNeeded(min_height=max_dim, min_width=max_dim, border_mode=cv2.BORDER_CONSTANT, value=self.value, always_apply=True)
        transform_func = A.Compose(
            [resize, pad],
        )
        img = transform_func(image=img)['image']

        return img

    def apply_to_bbox(self, bbox, **params):
        height, width = params['rows'], params['cols']
        max_dim = max(height, width)

        resize = A.LongestMaxSize(max_size=max_dim, always_apply=True)
        pad = A.PadIfNeeded(min_height=max_dim, min_width=max_dim, border_mode=cv2.BORDER_CONSTANT, value=self.value, always_apply=True)
        transform_func = A.Compose(
            [resize, pad],
        )
        bbox = transform_func(bbox=bbox)['bbox']

        return bbox

    def apply_to_keypoint(self, keypoint, **params):
        height, width = params['rows'], params['cols']
        max_dim = max(height, width)

        resize = A.LongestMaxSize(max_size=max_dim, always_apply=True)
        pad = A.PadIfNeeded(min_height=max_dim, min_width=max_dim, border_mode=cv2.BORDER_CONSTANT, value=self.value, always_apply=True)
        transform_func = A.Compose(
            [resize, pad],
        )
        keypoint = transform_func(keypoint=keypoint)['keypoint']

        return keypoint

    def get_transform_init_args_names(self):
        return ()