Open isarandi opened 6 years ago
The rough assessment of merging operations that change localities would be:
[translate by 20px, apply coarse dropout, rotate by 45deg]
is very different from [translate by 20px, rotate by 45deg, apply coarse dropout]
. In the first one, the dropped rectangular areas are rotated by 45 degrees, but not in the second. Merging decreases opportunities for such variance.So I can see quite some disadvantages/risks (and lots of work) with rather few advantages. And the performance advantage is often not necessary when using background augmentation on multiple cores. Though a case could be made to merge affine transformations when they would end up being executed right after another, e.g. in [crop by x, translate by 10px, rotate by 45, add gaussian noise]
merge the translation and rotation - but don't merge them in [crop by x, translate by 10px, add gaussian noise, rotate by 45]
.
For the simpler case of letting the user combine affine matrices intentionally (e.g. first translate by xyz px, then rotate by xyz deg) it seems more appropriate to just add a function to Affine or even a new augmenter, like AffineMatrix
and do something like AffineMatrix.create(translate=20).stack(rotate=(35, 45))
.
When building a long sequential augmentation and appending warps depending on what is needed, one may end up with a sequence that is a) inefficiently computed, b) throws away some image content.
Consider
iaa.Sequential([iaa.Affine(scale=2), iaa.Affine(scale=0.5)])
. Ideally this would be equivalent toiaa.Affine(scale=1)
*. But it seems to be hard to achieve in imgaug. A more realistic use-case is applying translation before a rotation and scaling in order to center on some important point of the image, likeiaa.Sequential([iaa.Affine(translate_px=10), iaa.Affine(rotate=(-20,20), scale=0.5)])
. The translation part ofiaa.Affine
gets applied after the rotation and scaling, so we cannot simply put the10
there (we'd need to transform the translation vector first by hand).In theory one could make a distinction between augmentations that modify pixel locations (warp the image) vs. those that only apply local effects, like blurring etc. Any chain of warping augmentations could then be collapsed into one. When possible, by matrix multiplications. When that's not possible because the transform is non-linear, one could chain the mapping functions of each warping transform and pass the resulting function as
inverse_map
toskimage.warp
.A current workaround is to build the geometric transformations outside of imgaug using matrix multiplications or skimage.transform and only using imgaug for the non-warping augmentations like Gaussian noise, multiplication by scalars, dropout etc.
*) Perhaps some users actually make use of and like the current behavior to achieve some sort of cropping effects, but I think it's not a frequent thing.