Segmentation tasks: augmentation of GT images

barucden commented 2 years ago

In the case of segmentation tasks, an input consists of an image and a ground-truth segmentation. To augment both the image and GT segmentation using the same operations, we can use

augment((img, gt), pipeline)

However, this cannot be used if the pipeline includes an operation that adjust the colors of images because it would damage the color encoding in the GT segmentations. As of now, we do not have such operations but non-geometric operations (such as changing contrast) were requested in #16, and PR #84 introduces first such operations.

Notice that we cannot just use
augment(img, pipeline)
because if the pipeline contains some affine operations, we want them applied on the GT segmentation too.

Generally, some operations (e.g., all affine operations) are desired to be applied on both the image and GT, while others (e.g., contrast adjustment) should be applied only on the image (and not GT). I think it raises two questions:

How to distinguish the image-only operations from image-and-gt operations?
How to adjust the API so it is clear whether the user passes only images, or images with GT segmentations?

I think that (1) could be resolved by introducing a new abstract type ImageOnlyOperation <: Operation. All operations that would damage GT segmentations would be subtypes of this abstract type. The existing type ImageOperation would denote operations that should be applied on everything (images and segmentations).

For (2), I think the API could be

augment(img, gt, pipeline)
augment(imgs, gts, pipeline) # length(imgs) == length(gts)

johnnychen94 commented 2 years ago

augment(imgs, gts, pipeline) looks like a missing API for Augmentor, but this should be discussed in two aspects:

single image input version: augment(img, gt, pl) should only be considered as a convenient version of augment((img, gt), pl)
batch input version: it's better to let augmentbatch! handle this case. (or add a new method augmentbatch)

augment(img, gt, pl) should only be considered as a convenient version of augment((img, gt), pl)

Generally, we prefer the trait version. Hardcoding the second input as ground truth might not be clear for all scenarios. And it is not very clear what operations should be placed under the category of ImageOnlyOperation: in some scenarios an operation is expected to be applied to ground truth, while in other scenarios it is not.

With a similar idea in mind, we can introduce a new array type on top of img to store the information on whether this image should be skipped or not, and a set of filter conditions and wrapper operations.

abstract type AbstractAugmentorCondition end
struct MatchSkip <: AbstractAugmentorCondition
    content::Symbol
end

struct AugmentorImage{T, N, AT<:AbstractArray{T, N}, C<:AbstractAugmentorCondition} <: AbstractArray{T, N}
    data::AT
    condition::C
end

struct IfOp{T<:ImageOperation, C<:AbstractAugmentorCondition} <: ImageOperation
    operation::T
    condition::C
end

function applyeager(op::IfOp, img::AugmentorImage)
    if match_condition(op.condition, img.condition)
        return AugmentorImage(applyeager(op, img.data), img.condition)
    else
        return img
    end
end

With this, we can tag images and pipelines so that they can be skipped at certain condition, e.g.,:

cond = MatchSkip(:gt)
gts = map(x->AugmentorImage(x, cond), gts)
pl = Rotate([10, -5, -3, 0, 3, 5, 10]) |> IfOp(AdjustContrastBrightness(...), cond)
map(imgs, gts) do img, gt
    augment(img, gt, pl)
end

How do you think of this?

barucden commented 2 years ago

To be honest, it seems too complex to me. Let me try to elaborate.

Operations and masks

I have experience mostly with Python libraries, mainly Albumentations and transforms from torchvision. The state of torchvision transforms is pretty much the same as ours -- GT segmentations are not considered -- so it is not relevant here. On the other hand, Albumentations directly provides an API like this:

# Define an augmentation pipeline
transform = Compose([HorizontalFlip(), RandomBrightnessContrast()])

# Without a mask:
transformed = transform(image=image)
# transformed["image"] is the augmented image

# With a mask:
transformed = transform(image=image, mask=mask)
# transformed["image"] is the augmented image and transformed["mask"] the augmented mask

Therefore, Albumentations decides which operations are suitable for both images and masks and which only for images. Basically, the rule is "if an operation moves pixels, use it for both the image and mask; if an operation changes colors, use it only for the image". I think it makes sense as any other way would damage the mask:

if the mask pixels are not moved in the same way as the corresponding pixels of the image are, then the mask becomes incorrect
if the mask colors get changed, then the mask becomes incorrect or even invalid (if a color is produced that does not denote any class)

I am not 100% sure if the rule is really that simple -- I would have to double check. But the point is Albumentations decides which operations are suitable for masks too. Since the library is quite popular, I would guess it covers most/all use cases.

So in my opinion, whether an operation is suitable for masks should be a property of the operation, and it should be defined by us.

API

I would not mind calling

augment((img, gt), pl)

but it seems incosistent that a 2-tuple would mean an image and a gt segmentation while n-tuple (n>2) would mean n images:

augment((img1, img2, img3), pl)

Maybe we could use a Pair instead of a 2-tuple:

augment(img => gt, pl)
augment_batch(imgs => gts, pl) # I would love a non-mutating version of augment_batch!

johnnychen94 commented 2 years ago

My main concern about augment(img, gt, pl) is its ambiguity to augment((img, gt), pl). augment(img => gt, pl) is a good proposal to me.

Basically, the rule is "if an operation moves pixels, use it for both the image and mask; if an operation changes colors, use it only for the image" ... So in my opinion, whether an operation is suitable for masks should be a property of the operation, and it should be defined by us.

I need to think about this and take a look at Albumentations. Will reply in a few days. A quick question, does augment(img => gt, pl) already satisfies the "mask" idea?

johnnychen94 commented 2 years ago

I would love a non-mutating version of augment_batch!

Yes, this looks like a missing function. And our existing documentation doesn't explain this very well.

My main focus is still on JuliaImages so didn't take care of this package very well. If you have an interest in maintaining this package and adding new features, I believe @Evizero would be very delighted to send you an invitation.

barucden commented 2 years ago

Does augment(img => gt, pl) already satisfies the "mask" idea?

(just to be clear: "mask" = "gt" = "gt segmentation") I think it satisfies the mask idea from the point of API call. IMHO it is simple enough and unambiguous. The return type could also be a Pair of two images (image + mask). So:

aug_img = augment(img, pl) # Augment just an image, mask not provided (already have this)
aug_img, aug_mask = augment(img => mask, pl) # Augment an image and the corresponding mask (proposed)

If you have an interest in maintaining this package and adding new features, I believe @Evizero would be very delighted to send you an invitation.

I would be interested in maintaining this package and adding new features; however, I don't feel experienced enough to just push new code without consulting. I would feel more comfortable if anyone checked my PRs first.

Slightly offtopic: if we had augment_batch (next to the existing augment_batch!), what would be the differences here:

# 1. Only images provided:
# a)
aug_imgs = augment(imgs, pl)
# b)
aug_imgs = augment_batch(imgs, pl)

# 2. Images and masks provided:
# a)
aug_imgs, aug_masks = augment(imgs => masks, pl)
# b)
aug_imgs, aug_masks = augment_batch(imgs => masks, pl)

Evizero commented 2 years ago

i have not thought this through I will admit, but an alternative way of approaching the mask problem would be with a decorator that could be dispatched on.

img1_out, img2_out = augment((img1, Mask(img2)), pl)

johnnychen94 commented 2 years ago

A good suggestion! The Mask version is more generic as it does not only allow img => mask pairs but also allows other combinations, e.g., (img, mask) => gt.

barucden commented 2 years ago

Yes, I also like this suggestion! Would augment keep the types of its inputs? I.e., would Mask be also returned?

johnnychen94 commented 2 years ago

Generally we should keep the information and not change the type as much as we can; so it should return a Mask if the input is a Mask.

There might be some glue codes and utilities needed to interact with other ecosystems, e.g. Flux. For example, we might need to provide convenient function to uniformly strip/add the Mask wrapper for batch inputs.

barucden commented 2 years ago

I checked Albumentations and it seems that the rule actually is "pixels move => apply on masks too; colors change => apply on image only".

Also note that they support augmentation of bounding boxes and key points. I don't think we need to implement it right now; just to keep in mind that it might be requested in the future.

List of transforms

#### Dual transforms (applied on images and masks – and possibly bounding boxes and key points) > List generated by ` grep -r "(DualTransform)" | cut -d ' ' -f 2 | cut -d '(' -f 1 | sort` - Affine - _BaseRandomSizedCrop - CenterCrop - CoarseDropout - Crop - CropAndPad - CropNonEmptyMaskIfExists - ElasticTransform - Flip - GridDistortion - GridDropout - HorizontalFlip - LongestMaxSize - MaskDropout - NoOp - OpticalDistortion - PadIfNeeded - Perspective - PiecewiseAffine - RandomCrop - RandomCropNearBBox - RandomGridShuffle - RandomRotate90 - RandomScale - RandomSizedBBoxSafeCrop - Resize - Rotate - SafeRotate - ShiftScaleRotate - SmallestMaxSize - Transpose - VerticalFlip #### Image only transforms > List generated by `grep -r "(ImageOnlyTransform)" | cut -d ' ' -f 2 | cut -d '(' -f 1 | sort` - Blur - ChannelDropout - ChannelShuffle - CLAHE - ColorJitter - Cutout - Downscale - Emboss - Equalize - FancyPCA - FDA - FromFloat - GaussianBlur - GaussNoise - HistogramMatching - HueSaturationValue - ImageCompression - InvertImg - ISONoise - MultiplicativeNoise - Normalize - Posterize - RandomBrightnessContrast - RandomFog - RandomGamma - RandomRain - RandomShadow - RandomSnow - RandomSunFlare - RandomToneCurve - RGBShift - Sharpen - Solarize - Superpixels - ToFloat - ToGray - ToSepia

maxfreu commented 2 years ago

I think the best would be to not touch the image types too much and to go for the pair notation proposed above. One could then introduce a new function 'applytomasks', check it for each transformation in the pipeline and throw those out for which it is false.

applytomasks(::Any) = true
applytomasks(::Rotate90) = true
applytomasks(::ColorJitter) = false

barucden commented 2 years ago

What I liked about the @Evizero's proposal (let's call it the tuple notation) was that it does not introduce a new "convention", whereas the pair notation does. But on second thought, maybe the pair notation is actually good.

The tuple notation allows, e.g.,

augment((img1, Mask(img2), Mask(img3), img4, img5), pl)

which does not have a clear meaning to me.

I believe that we should either accept an image, or an image with the corresponding label (a mask for segmentation tasks). And that's what a pair implies, right?

augment(img, pl) # Augment an image
augment(img => label, pl) # Augment an image and its label

For segmentation tasks, both img and label (a mask) are AbstractMatrix. However, generally, label could be anything (e.g., bounding boxes or key points), and we could dispatch on it:

image(imglabel::Pair) = imglabel.first
label(imglabel::Pair) = imglabel.second

augment(img::AbstractMatrix, pl) = augment_image(img, pl)
augment(imglabel::Pair, pl) = augment_image(image(imglabel), pl) => augment_label(label(imglabel), pl)

The function augment_image is basically what we have now, and the implementation of augment_label might follow something as @maxfreu proposed above.

maxfreu commented 2 years ago

Another option that comes to my mind, which would be super user-friendly and probably well extensible to keypoints etc. is to just use a named tuple (image=..., mask=..., keypoints=...). That would be approximately of type NamedTuple{(:image, :mask), Tuple{AbstractArray, AbstractArray}}, so one could also dispatch well on it. When keypoint transformations arrive at a later point in time, it should be easier to extend than a pair. As a bonus it's quite close to the Albumentations notation, which many are familiar with.

Evizero commented 2 years ago

what i like about the dispatch version is that image operations can decide themselves what they apply on and its extensible for users who can define their own decorator types that have their own properties. also it allows for arbitrary number of images and masks etc

applyto(::Mask, ::ColorJitter) = false

What i dislike about the pair notation is the limited scope. it hardcodes that there are two types of image data with a fixed semantic

johnnychen94 commented 2 years ago

Agree with @Evizero, I think we should build the core functionality by dispatching on image-level instead of the collection-level because it allows more flexible composition, and then wrap a thin layer to provide the convenient user interface for segmentation semantics, i.e., hardcoding the pair semantic or making a new function name.

Say we have an image restoration task with some ROI, and the label is the restored image: how would the pair version support (img, mask) => gt case, where all transformations should apply to img and gt while only spatial transformations apply to mask?

barucden commented 2 years ago

@Evizero It's hard to argue against your points. I guess that the only issue I have with this approach is the fact that the Mask wrapper would be returned too.

img, mask = data[i]
augmented_img, augmented_mask = augment((img, Mask(mask)), pl)
# now typeof(mask) == typeof(augmented_mask) does not hold

I, as a user, don't care about the Mask type. I just use it to tell Augmentor that an image is a mask. So I would like it better if the Mask is not returned from augment. Would that be viable?

johnnychen94 commented 2 years ago

We can hide Mask as an internal implementation and not export it.

Meanwhile, use the pair semantic or introduce a new function to do all the wrap-and-unwrap work.

johnnychen94 commented 2 years ago

There are two levels of dispatch happening here:

the user-level API (e.g., img => label) to internal trait type (e.g., Mask): this defines how we explain the collection of data in various ways.
the trait type Mask to implementation (skip or not): this defines how we process one single image.

barucden commented 2 years ago

Using the pair notation in the user-level API would still hardcode two types of image data with a fixed semantic, wouldn't it? Also it would not be as flexible as the tuple notation. I like the possibility (even though I don't have a use-case for it now) of doing

augment((img, Mask(mask), Mask(mask2), KeyPoint(kp), BBox(bbox), Whatever(we)), pl)

which is hard to cover with any single convention (such as image => mask). What I dislike about this is only the need to unwrap the returned types. Is there a way to tell Augmentor "this structure is just a decorator for you to understand the data semantics, and we want you to return the underlying structure"? Would it be a bad practice?

johnnychen94 commented 2 years ago

I'm sorry if I didn't make it clear in my previous comments, we can interpret the augment on pair input, or the named tuple input that @maxfreu suggested, for example:

function augment(p::Pair{<:AbstractArray, <:AbstractArray}, pl)
    f, s = augment((p.first, Mask(p.second)))
    return f => unwrap(s)
end

This is the user-level API dispatch as I mentioned in https://github.com/Evizero/Augmentor.jl/issues/85#issuecomment-887408439

barucden commented 2 years ago

So for this specific (and perhaps very common) case where we want to augment an image and its mask, there would be a convenience method, and for other cases (e.g., one image & many masks), the user would call augment with the decorators and then they would unwrap the returned types themselves?

johnnychen94 commented 2 years ago

Yes, that's my vision here.

The internal functionality should dispatch on image level with the decorators because it allows a more flexible combination, but the API part we can be creative and make it intuitive to specific use cases. But of course, this needs to be well documented.

barucden commented 2 years ago

Okay. That would combine the advantage of both approaches.

To sum up...

Wrappers

We introduce an abstract type for the wrappers, let's say AbstractWrapper, and one wrapper for now, Mask <: AbstractWrapper. Each wrapper would define the unwrap method, so for Mask we have constructor Mask(mask) and unwrap(m::Mask).

Internal implementation

Operations define if they should be applied to masks. This can be done as proposed by @Evizero:

applyto(::Mask, ::ColorJitter) = false

The augment method goes through all pipeline operations and for each image/wrapper, it checks applyto, and possibly applies the operation on the image/wrapper.

It might be useful to define a new class of operations, let's say ColorOperation <: ImageOperation. All operations that change colors would be subtypes of this type (it is only ColorJitter now). We could then define

applyto(::Mask, ::ColorOperation) = false
applyto(::Mask, ::AffineOperation) = true

User-level API

The augment method accepts a tuple of images or wrappers. Each image/wrapper is transformed with the exact same operations given in the pipeline (i.e., random parameters are generated only once for each operation).

Also, for ease of use, we introduce a new convenience method for segmentation tasks which could look like @johnnychen94 proposed:

# Convenience method for augmenting image and segmentation mask
function augment(p::Pair{<:AbstractArray, <:AbstractArray}, pl)
    f, s = augment((p.first, Mask(p.second)))
    return f => unwrap(s)
end

so that the user can augment an image and its mask as

aug_img, aug_mask = augment(img => mask, pl)

However, the user can always define the semantics themselves and augment any combination of input types:

augmented = augment((img1, Mask(img2), Mask(img3), img4), pl)
aug_img1, aug_img2, aug_img3, aug_img4 = unwrap.(augmented)

Future extensions

In case additional features are requested, such as using keypoints instead of masks, we would just implement a new wrapper, say KeyPoints, define appropriate applyto methods, and perhaps add another convenience method:

function augment(p::Pair{<:AbstractArray, <:SomeTypeHoldingKeyPoints}, pl)
    f, s = augment((p.first, KeyPoints(p.second)))
    return f => unwrap(s)
end

Does it seem correct?

johnnychen94 commented 2 years ago

This is completely what lies in my mind, thank you so much for putting it together!

FWIW, to avoid name conflicts, all these AbstractWrapper subtypes should not be exported, so users are expected to using Augmentor: Mask or Augmentor.Mask for every usage of it.

johnnychen94 commented 2 years ago

This is off-topic: https://invenia.github.io/blog/2019/11/06/julialang-features-part-2/ is a very good explanation on the applyto(::Mask, ::ColorJitter) = false trait design.

barucden commented 2 years ago

Alright, great! I will have some free time on the weekend so if there are no objections, I can try to come up with a PR then.

Evizero / Augmentor.jl