Feature request: Object-based augmentation via cropping and pasting

NesterukSergey commented 3 years ago

Hi!

I propose to add transforms that cut objects from different images using their segmentation masks and paste them to the new background. The idea is described here and some other research papers, and a demo can be found here.

Object-based augmentation example

Why is it useful?

It allows adding extra variability to training images by combining multiple objects on one scene and apply augmentations separately to objects, background, and the whole scene.

What are the cases?

It is extremely useful for few-shot learning problems where we have not much training data. In particular, it's proven to work well for agricultural domain and remote sensing problems.
It enables solving problems "in the wild" having only "lab" images by explicitly controlling the number of objects, their overlapping, background, and noise.
It allows preparing datasets for object counting and in some cases object detection, multiclass classification, instance segmentation, semantic segmentation, multi-task learning, etc. even if the original dataset was aimed to solve instance segmentation problem only.
Other cases TBD

How difficult is it to add it?

The main point is it doesn't require changes to existing code, and can be implemented as a wrapper.

Limitations

This method assumes that we have instance segmentation masks for the objects of interest. If only bounding boxes provided, we can still apply copy-pasing of the whole box like here.

Suggested functional interface


class ObjectBasedAugmentor:
    '''
    Generates scenes based on objects from multiple images as described in https://arxiv.org/abs/2102.12295. 
    Can take input sources either during initialization for more automative work or 
    during each call for more controllable behavior. If only bounding boxes provided,  
    applies copy-pasting of the whole box like shown in https://arxiv.org/abs/1906.11172.

    Args:
        images (Union[Iterable[np.ndarray], List[str]]): iterable of np.ndarray images or
                list of image pathes or None. If None, should be specified in object call.
                Original images with objects of interest.
        instance_masks (Union[Iterable[np.ndarray], List[str]]): iterable of np.ndarray images or
                list of image pathes or None. If None, should be specified in object call.
                Instance masks for the corresponding images. One layer per instance.
        backgrounds (Optional[Union[Iterable[np.ndarray], List[str], None]], List[str]]): 
                iterable of np.ndarray images or list of image pathes or None. 
                If None, should be specified in object call. Scene backgrounds. 
                Must have the same number of channels as image.
        additional_targets (Dict[str, np.ndarray]): dictionary with additional masks to transform.
        unique_color_masks (List[str]): list with names of masks from additional_targets.keys()
                for which unique colors for every original color should be generated. If mask
                is not in list, colors remain original after pasting objects on new scene.
        keypoints (list[int]): bounding boxes in [x0, y0, x1, y1] format,
                ranging from 0 to W and 0 to H.
        object_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
                to apply to each object independantly.
        background_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
                to apply to the background.
        scene_transforms (Callable[[np.ndarray], np.ndarray]): transforms or their composition
                to apply to the whole scene after pasting all objects.
        preprocess_dataset (bool): if True, dataset statistics will be calculated during init.
                Enables using class_proba.
        return_semantic (bool): if True, return additional semantic mask. 
        add_bboxes (bool): if True, calculated bounding boxes based on segmentation masks.
        objects_per_scene (int): the number of pasted objects in the final scene.
        overlap_ratio (float): the ratio of objects' overlapping in the final scene. [0...].
        packaging_rule (str): the algorithm to place objects on the scene. 
                One of ['smallest', 'random', 'grid'].
        result_size (Union[int, Tuple[int, int], str]): the way to process he size of the resulting scene.
                Original size if 'as_is'. [N, N] if N. [N, M] if (N, M).
        class_proba (Optional[np.ndarray]): defines the probability to choose object from each class.
                Must have preprocess_dataset enabled.
        adjust_sizes (bool): if True, normalizes sizes of pasted objects.
    '''

    def __init__(self,            
                images: Union[Iterable[np.ndarray], List[str], None],
                instance_masks: Union[Iterable[np.ndarray], List[str], None],
                backgrounds: Optional[Union[Iterable[np.ndarray], List[str], None]],
                bboxes: Optional[List[int]],

                additional_targets: Optional[Dict[str, np.ndarray]],
                unique_color_masks: Optional[List[str]],

                keypoints: Optional[np.ndarray],

                object_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
                background_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
                scene_transforms: Optional[Callable[[np.ndarray], np.ndarray]],

                preprocess_dataset: bool=False,

                return_semantic: bool=False,
                add_bboxes: bool=False,

                objects_per_scene: int=4,
                overlap_ratio: float=.0,
                packaging_rule: str='smallest',
                result_size: Union[int, Tuple[int, int], str]='as_is',
                class_proba: Optional[np.ndarray]=[],
                adjust_sizes: bool=False):
        pass

    def __call__(self,
                images: Union[Iterable[np.ndarray], List[str], None],
                instance_masks: Union[Iterable[np.ndarray], List[str], None],
                backgrounds: Optional[Union[Iterable[np.ndarray], List[str], None]],
                bboxes: Optional[List[int]],

                additional_targets: Optional[Dict[str, np.ndarray]],
                unique_color_masks: Optional[List[str]],

                keypoints: Optional[np.ndarray],

                object_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
                background_transforms: Optional[Callable[[np.ndarray], np.ndarray]],
                scene_transforms: Optional[Callable[[np.ndarray], np.ndarray]],

                return_semantic: bool=False,
                add_bboxes: bool=False,

                objects_per_scene: int=4,
                overlap_ratio: float=.0,
                packaging_rule: str='smallest',
                result_size: Union[int, Tuple[int, int], str]='as_is',
                adjust_sizes: bool=False
                ) -> Dict[str, np.ndarray]:
        '''
        Returns:
            result (Dict[str, np.ndarray]): dictionary with scene, transformed masks, 
                    bounding boxes, and keypoints.
        '''
        pass

It will also require adding some utils for copy-pasting objects.

creafz commented 3 years ago

Hey @NesterukSergey, thanks. Looks good to me! We can proceed with implementing this feature with Albumetnations.

I propose to create a new package augmentors in the albumentations directory and place all the required code into this package.

aliab3d commented 3 years ago

Hey @NesterukSergey, thanks. Looks good to me! We can proceed with implementing this feature with Albumetnations.

I propose to create a new package augmentors in the albumentations directory and place all the required code into this package.

The idea is inline with the copy-paste augmentation method which achieves very promising performance improvements. This would be a great addition to the Albumentations augmentations.

albumentations-team / albumentations