Add CopyAndPaste transform

ternaus commented 2 years ago

Specify the object as PNG with the empty background.
Specify background image.
Cut out a region in the background image that is larger than the object.
Paste the object to the cut.
Use Poisson Blending to inpaint space between pasted object and image.

i-aki-y commented 2 years ago

@ternaus Hi, I'm interested in the feature.

How can we specify the pasted object in an albumentation's transform? Do we need to introduce a new target keyword like 'paste_image':

transform(image=image, paste_image=paste_image, ...)

, or sample segments from the same target image as yolov5 is doing?

https://github.com/ultralytics/yolov5/blob/15e82d296720d4be344bf42a34d60ffd57b3eb28/utils/dataloaders.py#L706

Dipet commented 2 years ago

I think we could support this:

Add past_image_key param for transform - if this key is set we will use provided images
Add past_image_dir parms for transform - is this key is set we will sample random image from this directory
Otherwise sample random segment from the image.

i-aki-y commented 1 year ago

@Dipet thank you for your reply.

Do You mean that the paste_image_key is used to set a path to the png file as a target?

transform(image=image, paste_image_key="path/to/png", ...)

If so, I think using past_image_dir is a better choice since the past_image_dir can be used as a parameter of the constructor. And we can avoid introducing new targets and follow the standard usage.

Ex.

transform Compose([
    CopyAndPaste(paste_image_dir=objects_dir, …),
...
])

transform(image=image, …)   # we can still use only the standard targets bboxes, and masks.

To get a feel for it, I made a workable example in PR #1297 (still working).

zetyquickly commented 5 months ago

Feature description

Add CopyAndPaste Augmentation from https://arxiv.org/abs/2012.07177

Used in github.com/ultralytics/yolov5/blob/ac6c4383bc0c7a2a4f7ca18f8733821b49e916bd/utils/augmentations.py#L19

Checked the yolov5 code, here. It looks they don't do the method the paper describes

Paper quote:

Our approach for generating new data using Copy-Paste is very simple. We randomly select two images and apply random scale jittering and random horizontal flipping on each of them. Then we select a random subset of objects from one of the images and paste them onto the other image. Lastly, we adjust the ground-truth annotations accordingly: we remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

What they do:

Take an image and its segments (masks of objects on the image).
Mirror these segments relative to the center of the whole image.
Paste the new mirrored segments back onto the image (if their IOA is low).

zetyquickly commented 5 months ago

Regarding this PR.

I haven't checked the implementation in detail, but there are two points so far:

It implements more than what the "Simple Copy-Paste ..." paper suggests, which is great.
Its call signature could be improved. The parameters:
```
paste_image_dir=object_dir,
get_label_from_path=get_label_from_path
```
appear suboptimal. It would be better to handle everything in memory, working directly with already loaded images, labels, and masks.

ternaus commented 5 months ago

@zetyquickly

Loading everything to memory or loading from the disk could be of personal preference.

In the lastest transform that needed to load extra data from disk it looks like

https://github.com/albumentations-team/albumentations/blob/main/albumentations/augmentations/mixing/transforms.py

We have pair of:

reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.

and

read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.

reference_data could be generator, sequence of ids, paths, images loaded into memory

and read_fn is function that maps from reference_data element to something that transform uses.

=> if person wants to load everything into memory beforehead => load it to reference_data and use lambda x: x as read_fn,

if you want to read on the fly => let all the work happen in read_fn

It looks like a lot of different functionality is added in that PR, I would probably split it into different PR's

albumentations-team / albumentations

Add CopyAndPaste transform #1225

Feature description