Unify the augment interface

tl;dr Remove augmentbatch! and have augment and augment! support batch inputs. I think it's doable but there are some issues.

Currently we have

augment(imgs, pl),
augment!(outs, imgs, pl),
augmentbatch!(out_batch, img_batch, pl),

and we would be also interested in

augmentbatch(img_batch, pl).

Should we drop the "batch" versions, and keep just augment and augment!? And if so, can we do it? I tend to think yes and yes.

Why

It simplifies the interface. The user does not have to care if they work with batches or single images, the same function will work for both.

Having only a single function would help us solve #33 easily, too.

How

I believe the main issue is to tell if an input is a batch or not based on its type. Once we are done with that, we can just employ dispatching to call appropriate code.

In my limited view of the world, an image is a 3-dimensional array of shape (H, W, C), where H, W, C is the height, width, and number of channels. When C=1, the image can as well be a 2-dimensional array of shape (H, W).

A batch is a 4-dimensional array of shape (N, H, W, C), where N is the number of images. Again, when C==1, the batch can be a 3-dimensional array of shape (N, H, W).

In Julia, the situation is a little bit more complicated, as the array elements tend to be structure instances (like RGB) and the "channel" dimension is omitted.

Also, the dimensions can be permuted (e.g., (H, W, C, N)), but Augmentor already deals with that to some extent.

The following table maps the input types to the decision batch/image. Unfortunatelly, there is one ambiguity (typed in bold).

Type	Shape	Element	Example	Decision
AbstractArray{<:Number, 4}	`(N, H, W, C)`	intensity in a channel	batch of RGB images	batch
AbstractArray{<:Color{T, 3}, 3}	`(N, H, W)`	3-d value for all channels	batch of RGB images	batch
AbstractArray{<:Color{T, 1}, 3}	`(N, H, W)`	pixel intensity	batch of grayscale images	batch
AbstractArray{<:Number, 3}	`(N, H, W)`	pixel intensity	batch of grayscale images	image
AbstractArray{<:Number, 3}	`(H, W, C)`	intensity in a channel	one RGB image	image
AbstractArray{<:Color{T, 3}, 2}	`(H, W)`	3-d value for all channels	one RGB image	image
AbstractArray{<:Color{T, 1}, 2}	`(N, H, W)`	pixel intensity	one grayscale image	batch
AbstractArray{<:Number, 2}	`(H, W)`	pixel intensity	one grayscale image	image

I came up with two ways to resolve the ambiguity, both of which are sort-of bad:

Just go with one of the two options and document it.
Support arrays of colors only, not numbers.

I could see this ambiguity take down the whole proposal. Yet, I would love if we could figure it out.

Mixing batches and images

Now, it is possible to do

augment((img1, img2), pl)

which applies the same operations to both img1 and img2. On the contrary, this

augmentbatch(batch, pl)

applies possibly different operations to different images in batch (assuming the non-mutating version of augmentbatch! exists). Consequently, the following would not be well-defined:

augment((img1, img2, batch), pl)

There is a contradiction in it: "apply the same operations to img1, img2, and batch, and also apply different operations to images in batch".

For this reason, I think we should disallow mixing batches and images on the input.

Dealing with semantic wrappers

We would require the following to work:

# Augment an image
augment(img, pl)
# Augment an image and its mask (same operations applied to both)
augment(img => mask, pl)
# An alternative to the previous
augment((img, Mask(mask)), pl)
# Generalization of the previous for more inputs
augment((img, Mask(mask), KeyPoints(kp)), pl)

# Augment a batch of images
augment(batch, pl)
# Augment a batch of images and their masks 
# (same operations applied to corresponding image-mask pairs)
augment(batch => masks, pl)
# An alternative to the previous
augment((batch, Mask(masks)), pl)
# Generalization of the previous
# Probably length(batch) == length(masks) == length(kps)
augment((batch, Mask(masks), KeyPoints(kps)), pl)

Questions for discussion

I am interested in any opinion of yours, but here I list two questions to start a discussion:

Does it look desirable to you?
How do you think we should deal with the input ambiguity?

This is a long issue and I don't have enough bandwidth to actually discuss a potential solution for the moment.

There are two points I'm concerned with:

I know this is used as a default in other frameworks, but in Julia, we still tend to keep the colorant information. So asking the users to explicitly expand an Array{RGB{Float64}, 2} into Array{Float64, 3} is not a good design. I think this is where the ambiguity comes from.
For better batch support, because we don't know which dimension is the batch dimension, we might still stick to MLDataPattern. That package is a bit old but is still working pretty well, and is carefully designed to provide generic support. Most functions in MLDataPattern provide a obsdim keyword so that user can explicitly specify where N is. This is also the reason why augmentbatch! provides obsdim.

I'm not very sure what my opinion on combining augment and augmentbatch! is. But just to make sure we're on the same ground, I think the main difficulty of augmentbatch! is that:

Both augment and augmentbatch! support batch processing.
it lacks enough documentation to explain the difference here.

I think the difference is: augment handles the smallest input item, which may consist of multiple data(e.g., image, mask, keypoints) with a strong connection, e.g., they use the same rand params. On the other side, augmentbatch! deals with multiple input items, any item of which is not related to others.

I also notice @lorenzoh is developing a new package DataAugmentation.jl so I'd like to invite him into the discussion.

Thank you for your feedback!

So asking the users to explicitly expand an Array{RGB{Float64}, 2} into Array{Float64, 3} is not a good design. I think this is where the ambiguity comes from.

I agree. Just note that the ambiguity comes from using the numeric arrays, not the colorant ones. Array{RGB{Float64}, 2} is not ambiguous, but Array{Float64, 3} is (can be both batch of shape (N, H, W) and image of shape (H, W, C)).

For better batch support, because we don't know which dimension is the batch dimension, we might still stick to MLDataPattern.

This is what we use now, right? I did not mention it in the first post but I also think this is a good approach.

I think the difference is: ...

Great summary. I feel like augmentbatch! does not provide much more than what augment does. It's basically this (not sure if this is completely correct, but you get the idea):

batch = [img1, img2, ...]
aug_batch = augment.(batch, pl)
# or
batch = cat(img1, img2, dims=3) # (H, W, N)
aug_batch = cat(augment.(obsview(batch, 3), pl), dims=3)

I would be interested in @maxfreu's opinion too.

Hey, author of DataAugmentation.jl here.

I created DataAugmentation.jl to adress some of my pain points with Augmentor.jl, though that was a while ago and I don't know how much has changed since then.

In DataAugmentation.jl

every kind of data is wrapped in a semantic wrapper; this is used for dispatch to avoid type ambiguities but also to attach metadata to the data: in the case of keypoints, you need the bounds information, for example to rotate around the center. Attaching this to the Keypoints wrapper allows you to augment keypoints even in the absence of an image.
all transformation implementations are deterministic and random state is generated once and passed through to make sure that every piece of data is augmented the same

See the following links:

DataAugmentation.jl doesn't have support for batches yet, but that will be implemented in the future as a wrapper around a semantic wrapper, i.e. instead of having a single image Image{2, RGB{N0f8}}(imdata) you would have Batch{Image{2, RGB{N0f8}}(imdatas). Then the default would be to take views of each observation in the batch and apply the transformations to those, which could be overwritten by implementations on the whole batch which may be more performant.

In my opinion it is absolutely ok and very clear to have two functions, one for single images, one for batches. Makes things easier programming-wise and it's also easy to communicate to users.

@lorenzoh what exactly were your pain points? I think augmentor's design is flexible enough to be applied to various datatypes and support for masks is in the making, although keypoints etc are still missing. I wonder how augmentor's optimized, generated functions compare to DataAugmentation's use of buffers. I think the best thing would be to concentrate work on a single one-fits-all package (which in my opinion should be augmentor, because its fast). Also note that there is a third, small package: Augmentations.jl

Now it gets lengthy and maybe a bit off-topic :D

Just to give my use case: I work with satellite images, which can come with any number of bands, without a specific color connotation. When I load a single image from disk via ArchGDAL, it is usually an Array{Float32, 3} with dim ordering WHC (@barucden, keep in mind that julia is column major, I think your above table reflects row major order). To make this work with augmentor, I have to permute to CWH and perform reinterpret gymnastics to SVectors.

I know this is used as a default in other frameworks, but in Julia, we still tend to keep the colorant information. So asking the users to explicitly expand an Array{RGB{Float64}, 2} into Array{Float64, 3} is not a good design. I think this is where the ambiguity comes from.

Yes, that's maybe true, but sometimes it's also a nuisance that I can't just throw 3D arrays in. Albumentations for example expects CWH inputs, too - knowing this, I can follow the contract and that's it. Clearly not the most flexible solution, but it works. Having prepared the images for augmentor, I sometimes miss augmentations, especially color augmentations for non-rgb data. Taking this, and other use-cases in deep learning into consideration my personal list of priorities for an augmentation package looks like this:

Support many different image augmentations
Support for raster masks, keypoints, bounding boxes, polygons
Speed & multi-threading (already quite well addressed here I think)
Good support for non-RGB images
Support for different types of input, like Matrix{SomeColorType}, Matrix{AbstractVector}, Array{T,N} with obsdim or even NamedDimsArray{CWH} to clarify the dimension meaning.
Related to 5: Not having to transpose my data before and after, because my data comes in WHC and Flux needs WHCN. But I think for augmentations it's ok to use CWH, because whats close in RAM gets transformed together, which SIMDs well.
Sofa-comfortable API

So for the time being, in my opinion development should focus on 1 & 2, things like this issue can come later unless they are a blocker for other work.

There are two things that augmentbatch! works quite well:

it doesn't allocate new memory for the output
it allows multiple threads via trait type CPUThreads() from ComputationalResources.jl (Unfortunately this trait isn't used very widely in the julia community)

Just noticed the example you provided in https://github.com/Evizero/Augmentor.jl/issues/54#issuecomment-897022475

outbatch(X) = Array{Float32}(undef, (28, 28, 1, nobs(X)))
augmentbatch((X, y)) = augmentbatch!(outbatch(X), X, pl), y

Every call of augmentbatch would introduce a memory allocation and thus hurt the overall performance somehow.

I didn't check it, but I'm wondering if augment+Dataloader could replace augmentbatch!.

keep in mind that julia is column major, I think your above table reflects row major order

Yes, you are right. I am sorry if the order that I used in the table confused anyone. Hopefully, it did not overshadow the main point of the proposal.

I now realize I was maybe too eager to propose this. I did not have a clear vision how to proceed further, and now I see there are more steps to take before we can even discuss what's proposed in the original post.

I took a quick look at DataAugmentation.jl, and I think we have some catching up to do in some aspects.

@johnnychen94 Doesn't my example allocate memory for each batch too? In my work, batches are of different sizes quite often, so I tend to just allocate new batch each time. I'll check DataLoader. Thanks for the reference.

Evizero / Augmentor.jl