Open ternaus opened 2 years ago
@ternaus Hi, I'm interested in the feature.
How can we specify the pasted object in an albumentation's transform? Do we need to introduce a new target keyword like 'paste_image':
transform(image=image, paste_image=paste_image, ...)
, or sample segments from the same target image as yolov5 is doing?
I think we could support this:
past_image_key
param for transform - if this key is set we will use provided imagespast_image_dir
parms for transform - is this key is set we will sample random image from this directory@Dipet thank you for your reply.
Do You mean that the paste_image_key
is used to set a path to the png file as a target?
transform(image=image, paste_image_key="path/to/png", ...)
If so, I think using past_image_dir
is a better choice since the past_image_dir
can be used as a parameter of the constructor. And we can avoid introducing new targets and follow the standard usage.
Ex.
transform Compose([
CopyAndPaste(paste_image_dir=objects_dir, …),
...
])
transform(image=image, …) # we can still use only the standard targets bboxes, and masks.
To get a feel for it, I made a workable example in PR #1297 (still working).
Feature description
Add CopyAndPaste Augmentation from https://arxiv.org/abs/2012.07177
Used in github.com/ultralytics/yolov5/blob/ac6c4383bc0c7a2a4f7ca18f8733821b49e916bd/utils/augmentations.py#L19
Checked the yolov5 code, here. It looks they don't do the method the paper describes
Paper quote:
Our approach for generating new data using Copy-Paste is very simple. We randomly select two images and apply random scale jittering and random horizontal flipping on each of them. Then we select a random subset of objects from one of the images and paste them onto the other image. Lastly, we adjust the ground-truth annotations accordingly: we remove fully occluded objects and update the masks and bounding boxes of partially occluded objects.
What they do:
Regarding this PR.
I haven't checked the implementation in detail, but there are two points so far:
paste_image_dir=object_dir,
get_label_from_path=get_label_from_path
appear suboptimal. It would be better to handle everything in memory, working directly with already loaded images, labels, and masks.
@zetyquickly
Loading everything to memory or loading from the disk could be of personal preference.
In the lastest transform that needed to load extra data from disk it looks like
We have pair of:
reference_data (Optional[Union[Generator[ReferenceImage, None, None], Sequence[Any]]]):
A sequence or generator of dictionaries containing the reference data for mixing
If None or an empty sequence is provided, no operation is performed and a warning is issued.
and
read_fn (Callable[[ReferenceImage], Dict[str, Any]]):
A function to process items from reference_data. It should accept items from reference_data
and return a dictionary containing processed data:
- The returned dictionary must include an 'image' key with a numpy array value.
- It may also include 'mask', 'global_label' each associated with numpy array values.
Defaults to a function that assumes input dictionary contains numpy arrays and directly returns it.
reference_data could be generator, sequence of ids, paths, images loaded into memory
and read_fn is function that maps from reference_data element to something that transform uses.
=> if person wants to load everything into memory beforehead => load it to reference_data and use lambda x: x as read_fn,
if you want to read on the fly => let all the work happen in read_fn
It looks like a lot of different functionality is added in that PR, I would probably split it into different PR's