albumentations-team / albumentations

Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
https://albumentations.ai
MIT License
14.02k stars 1.63k forks source link

Add CopyAndPaste transform #1225

Open ternaus opened 2 years ago

ternaus commented 2 years ago
  1. Specify the object as PNG with the empty background.
  2. Specify background image.
  3. Cut out a region in the background image that is larger than the object.
  4. Paste the object to the cut.
  5. Use Poisson Blending to inpaint space between pasted object and image.
i-aki-y commented 2 years ago

@ternaus Hi, I'm interested in the feature.

How can we specify the pasted object in an albumentation's transform? Do we need to introduce a new target keyword like 'paste_image':

transform(image=image, paste_image=paste_image, ...)

, or sample segments from the same target image as yolov5 is doing?

https://github.com/ultralytics/yolov5/blob/15e82d296720d4be344bf42a34d60ffd57b3eb28/utils/dataloaders.py#L706

Dipet commented 2 years ago

I think we could support this:

i-aki-y commented 1 year ago

@Dipet thank you for your reply.

Do You mean that the paste_image_key is used to set a path to the png file as a target?

transform(image=image, paste_image_key="path/to/png", ...)

If so, I think using past_image_dir is a better choice since the past_image_dir can be used as a parameter of the constructor. And we can avoid introducing new targets and follow the standard usage.

Ex.

transform Compose([
    CopyAndPaste(paste_image_dir=objects_dir, …),
...
])

transform(image=image, …)   # we can still use only the standard targets bboxes, and masks.

To get a feel for it, I made a workable example in PR #1297 (still working).

zetyquickly commented 5 months ago

Feature description

Add CopyAndPaste Augmentation from https://arxiv.org/abs/2012.07177

Used in github.com/ultralytics/yolov5/blob/ac6c4383bc0c7a2a4f7ca18f8733821b49e916bd/utils/augmentations.py#L19

Checked the yolov5 code, here. It looks they don't do the method the paper describes

Paper quote:

Our approach for generating new data using Copy-Paste is very simple. We randomly select two images and apply random scale jittering and random horizontal flipping on each of them. Then we select a random subset of objects from one of the images and paste them onto the other image. Lastly, we adjust the ground-truth annotations accordingly: we remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.

What they do:

zetyquickly commented 5 months ago

Regarding this PR.

I haven't checked the implementation in detail, but there are two points so far:

ternaus commented 5 months ago

@zetyquickly

Loading everything to memory or loading from the disk could be of personal preference.

In the lastest transform that needed to load extra data from disk it looks like

https://github.com/albumentations-team/albumentations/blob/main/albumentations/augmentations/mixing/transforms.py

We have pair of:

reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
            A sequence or generator of dictionaries containing the reference data for mixing
            If None or an empty sequence is provided, no operation is performed and a warning is issued.

and

read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
            A function to process items from reference_data. It should accept items from reference_data
            and return a dictionary containing processed data:
                - The returned dictionary must include an 'image' key with a numpy array value.
                - It may also include 'mask', 'global_label' each associated with numpy array values.
            Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.

reference_data could be generator, sequence of ids, paths, images loaded into memory

and read_fn is function that maps from reference_data element to something that transform uses.

=> if person wants to load everything into memory beforehead => load it to reference_data and use lambda x: x as read_fn,

if you want to read on the fly => let all the work happen in read_fn


It looks like a lot of different functionality is added in that PR, I would probably split it into different PR's