aleju / imgaug

Image augmentation for machine learning experiments.
http://imgaug.readthedocs.io
MIT License
14.26k stars 2.42k forks source link

Sequence of warps may discard some image content #80

Open isarandi opened 6 years ago

isarandi commented 6 years ago

When building a long sequential augmentation and appending warps depending on what is needed, one may end up with a sequence that is a) inefficiently computed, b) throws away some image content.

Consider iaa.Sequential([iaa.Affine(scale=2), iaa.Affine(scale=0.5)]). Ideally this would be equivalent to iaa.Affine(scale=1)*. But it seems to be hard to achieve in imgaug. A more realistic use-case is applying translation before a rotation and scaling in order to center on some important point of the image, like iaa.Sequential([iaa.Affine(translate_px=10), iaa.Affine(rotate=(-20,20), scale=0.5)]). The translation part of iaa.Affine gets applied after the rotation and scaling, so we cannot simply put the 10 there (we'd need to transform the translation vector first by hand).

In theory one could make a distinction between augmentations that modify pixel locations (warp the image) vs. those that only apply local effects, like blurring etc. Any chain of warping augmentations could then be collapsed into one. When possible, by matrix multiplications. When that's not possible because the transform is non-linear, one could chain the mapping functions of each warping transform and pass the resulting function as inverse_map to skimage.warp.

A current workaround is to build the geometric transformations outside of imgaug using matrix multiplications or skimage.transform and only using imgaug for the non-warping augmentations like Gaussian noise, multiplication by scalars, dropout etc.

*) Perhaps some users actually make use of and like the current behavior to achieve some sort of cropping effects, but I think it's not a frequent thing.

aleju commented 6 years ago

The rough assessment of merging operations that change localities would be:

Advantages:

  1. Performance boost
  2. Evades sometimes throwing away image contents

Disadvantages:

  1. Kinda goes against the independency of augmenters. It would require code in Sequential, SomeOf, etc. that analyzes child augmenters and merges them dynamically.
  2. Complicated to implement. Augmenters that work fairly differently would somehow have to be merged (e.g. PiecewiseAffine and Affine).
  3. Smells like a construct that is going to induce lots of bugs and unintuitive behaviours.
  4. Decreases variance in the outputs. E.g. the sequence [translate by 20px, apply coarse dropout, rotate by 45deg] is very different from [translate by 20px, rotate by 45deg, apply coarse dropout]. In the first one, the dropped rectangular areas are rotated by 45 degrees, but not in the second. Merging decreases opportunities for such variance.

So I can see quite some disadvantages/risks (and lots of work) with rather few advantages. And the performance advantage is often not necessary when using background augmentation on multiple cores. Though a case could be made to merge affine transformations when they would end up being executed right after another, e.g. in [crop by x, translate by 10px, rotate by 45, add gaussian noise] merge the translation and rotation - but don't merge them in [crop by x, translate by 10px, add gaussian noise, rotate by 45].

For the simpler case of letting the user combine affine matrices intentionally (e.g. first translate by xyz px, then rotate by xyz deg) it seems more appropriate to just add a function to Affine or even a new augmenter, like AffineMatrix and do something like AffineMatrix.create(translate=20).stack(rotate=(35, 45)).