Evizero / Augmentor.jl

A fast image augmentation library in Julia for machine learning.
https://evizero.github.io/Augmentor.jl/
Other
138 stars 47 forks source link

Unify the augment interface #97

Open barucden opened 3 years ago

barucden commented 3 years ago

tl;dr Remove augmentbatch! and have augment and augment! support batch inputs. I think it's doable but there are some issues.


Currently we have

and we would be also interested in

Should we drop the "batch" versions, and keep just augment and augment!? And if so, can we do it? I tend to think yes and yes.

Why

It simplifies the interface. The user does not have to care if they work with batches or single images, the same function will work for both.

Having only a single function would help us solve #33 easily, too.

How

I believe the main issue is to tell if an input is a batch or not based on its type. Once we are done with that, we can just employ dispatching to call appropriate code.

In my limited view of the world, an image is a 3-dimensional array of shape (H, W, C), where H, W, C is the height, width, and number of channels. When C=1, the image can as well be a 2-dimensional array of shape (H, W).

A batch is a 4-dimensional array of shape (N, H, W, C), where N is the number of images. Again, when C==1, the batch can be a 3-dimensional array of shape (N, H, W).

In Julia, the situation is a little bit more complicated, as the array elements tend to be structure instances (like RGB) and the "channel" dimension is omitted.

Also, the dimensions can be permuted (e.g., (H, W, C, N)), but Augmentor already deals with that to some extent.

The following table maps the input types to the decision batch/image. Unfortunatelly, there is one ambiguity (typed in bold).

Type Shape Element Example Decision
AbstractArray{<:Number, 4} (N, H, W, C) intensity in a channel batch of RGB images batch
AbstractArray{<:Color{T, 3}, 3} (N, H, W) 3-d value for all channels batch of RGB images batch
AbstractArray{<:Color{T, 1}, 3} (N, H, W) pixel intensity batch of grayscale images batch
AbstractArray{<:Number, 3} (N, H, W) pixel intensity batch of grayscale images image
AbstractArray{<:Number, 3} (H, W, C) intensity in a channel one RGB image image
AbstractArray{<:Color{T, 3}, 2} (H, W) 3-d value for all channels one RGB image image
AbstractArray{<:Color{T, 1}, 2} (N, H, W) pixel intensity one grayscale image batch
AbstractArray{<:Number, 2} (H, W) pixel intensity one grayscale image image

I came up with two ways to resolve the ambiguity, both of which are sort-of bad:

  1. Just go with one of the two options and document it.
  2. Support arrays of colors only, not numbers.

I could see this ambiguity take down the whole proposal. Yet, I would love if we could figure it out.

Mixing batches and images

Now, it is possible to do

augment((img1, img2), pl)

which applies the same operations to both img1 and img2. On the contrary, this

augmentbatch(batch, pl)

applies possibly different operations to different images in batch (assuming the non-mutating version of augmentbatch! exists). Consequently, the following would not be well-defined:

augment((img1, img2, batch), pl)

There is a contradiction in it: "apply the same operations to img1, img2, and batch, and also apply different operations to images in batch".

For this reason, I think we should disallow mixing batches and images on the input.

Dealing with semantic wrappers

We would require the following to work:

# Augment an image
augment(img, pl)
# Augment an image and its mask (same operations applied to both)
augment(img => mask, pl)
# An alternative to the previous
augment((img, Mask(mask)), pl)
# Generalization of the previous for more inputs
augment((img, Mask(mask), KeyPoints(kp)), pl)

# Augment a batch of images
augment(batch, pl)
# Augment a batch of images and their masks 
# (same operations applied to corresponding image-mask pairs)
augment(batch => masks, pl)
# An alternative to the previous
augment((batch, Mask(masks)), pl)
# Generalization of the previous
# Probably length(batch) == length(masks) == length(kps)
augment((batch, Mask(masks), KeyPoints(kps)), pl)

Questions for discussion

I am interested in any opinion of yours, but here I list two questions to start a discussion:

  1. Does it look desirable to you?
  2. How do you think we should deal with the input ambiguity?
johnnychen94 commented 3 years ago

This is a long issue and I don't have enough bandwidth to actually discuss a potential solution for the moment.

There are two points I'm concerned with:

I'm not very sure what my opinion on combining augment and augmentbatch! is. But just to make sure we're on the same ground, I think the main difficulty of augmentbatch! is that:

I think the difference is: augment handles the smallest input item, which may consist of multiple data(e.g., image, mask, keypoints) with a strong connection, e.g., they use the same rand params. On the other side, augmentbatch! deals with multiple input items, any item of which is not related to others.


I also notice @lorenzoh is developing a new package DataAugmentation.jl so I'd like to invite him into the discussion.

barucden commented 3 years ago

Thank you for your feedback!

So asking the users to explicitly expand an Array{RGB{Float64}, 2} into Array{Float64, 3} is not a good design. I think this is where the ambiguity comes from.

I agree. Just note that the ambiguity comes from using the numeric arrays, not the colorant ones. Array{RGB{Float64}, 2} is not ambiguous, but Array{Float64, 3} is (can be both batch of shape (N, H, W) and image of shape (H, W, C)).

For better batch support, because we don't know which dimension is the batch dimension, we might still stick to MLDataPattern.

This is what we use now, right? I did not mention it in the first post but I also think this is a good approach.

I think the difference is: ...

Great summary. I feel like augmentbatch! does not provide much more than what augment does. It's basically this (not sure if this is completely correct, but you get the idea):

batch = [img1, img2, ...]
aug_batch = augment.(batch, pl)
# or
batch = cat(img1, img2, dims=3) # (H, W, N)
aug_batch = cat(augment.(obsview(batch, 3), pl), dims=3)

I would be interested in @maxfreu's opinion too.

lorenzoh commented 3 years ago

Hey, author of DataAugmentation.jl here.

I created DataAugmentation.jl to adress some of my pain points with Augmentor.jl, though that was a while ago and I don't know how much has changed since then.

In DataAugmentation.jl

See the following links:

DataAugmentation.jl doesn't have support for batches yet, but that will be implemented in the future as a wrapper around a semantic wrapper, i.e. instead of having a single image Image{2, RGB{N0f8}}(imdata) you would have Batch{Image{2, RGB{N0f8}}(imdatas). Then the default would be to take views of each observation in the batch and apply the transformations to those, which could be overwritten by implementations on the whole batch which may be more performant.

maxfreu commented 3 years ago

In my opinion it is absolutely ok and very clear to have two functions, one for single images, one for batches. Makes things easier programming-wise and it's also easy to communicate to users.

@lorenzoh what exactly were your pain points? I think augmentor's design is flexible enough to be applied to various datatypes and support for masks is in the making, although keypoints etc are still missing. I wonder how augmentor's optimized, generated functions compare to DataAugmentation's use of buffers. I think the best thing would be to concentrate work on a single one-fits-all package (which in my opinion should be augmentor, because its fast). Also note that there is a third, small package: Augmentations.jl

Now it gets lengthy and maybe a bit off-topic :D

Just to give my use case: I work with satellite images, which can come with any number of bands, without a specific color connotation. When I load a single image from disk via ArchGDAL, it is usually an Array{Float32, 3} with dim ordering WHC (@barucden, keep in mind that julia is column major, I think your above table reflects row major order). To make this work with augmentor, I have to permute to CWH and perform reinterpret gymnastics to SVectors.

I know this is used as a default in other frameworks, but in Julia, we still tend to keep the colorant information. So asking the users to explicitly expand an Array{RGB{Float64}, 2} into Array{Float64, 3} is not a good design. I think this is where the ambiguity comes from.

Yes, that's maybe true, but sometimes it's also a nuisance that I can't just throw 3D arrays in. Albumentations for example expects CWH inputs, too - knowing this, I can follow the contract and that's it. Clearly not the most flexible solution, but it works. Having prepared the images for augmentor, I sometimes miss augmentations, especially color augmentations for non-rgb data. Taking this, and other use-cases in deep learning into consideration my personal list of priorities for an augmentation package looks like this:

  1. Support many different image augmentations
  2. Support for raster masks, keypoints, bounding boxes, polygons
  3. Speed & multi-threading (already quite well addressed here I think)
  4. Good support for non-RGB images
  5. Support for different types of input, like Matrix{SomeColorType}, Matrix{AbstractVector}, Array{T,N} with obsdim or even NamedDimsArray{CWH} to clarify the dimension meaning.
  6. Related to 5: Not having to transpose my data before and after, because my data comes in WHC and Flux needs WHCN. But I think for augmentations it's ok to use CWH, because whats close in RAM gets transformed together, which SIMDs well.
  7. Sofa-comfortable API

So for the time being, in my opinion development should focus on 1 & 2, things like this issue can come later unless they are a blocker for other work.

johnnychen94 commented 3 years ago

There are two things that augmentbatch! works quite well:

Just noticed the example you provided in https://github.com/Evizero/Augmentor.jl/issues/54#issuecomment-897022475

outbatch(X) = Array{Float32}(undef, (28, 28, 1, nobs(X)))
augmentbatch((X, y)) = augmentbatch!(outbatch(X), X, pl), y

Every call of augmentbatch would introduce a memory allocation and thus hurt the overall performance somehow.

I didn't check it, but I'm wondering if augment+Dataloader could replace augmentbatch!.

barucden commented 3 years ago

keep in mind that julia is column major, I think your above table reflects row major order

Yes, you are right. I am sorry if the order that I used in the table confused anyone. Hopefully, it did not overshadow the main point of the proposal.

I now realize I was maybe too eager to propose this. I did not have a clear vision how to proceed further, and now I see there are more steps to take before we can even discuss what's proposed in the original post.

I took a quick look at DataAugmentation.jl, and I think we have some catching up to do in some aspects.

@johnnychen94 Doesn't my example allocate memory for each batch too? In my work, batches are of different sizes quite often, so I tend to just allocate new batch each time. I'll check DataLoader. Thanks for the reference.