Supported mask formats with Albumentations

Your Question

From the documentation, both API reference and [user guide] (https://albumentations.ai/docs/getting_started/mask_augmentation/) sections, it's not straightforward to understand which kind of mask format is supported and more importantly, if different mask formats can lead to different transformation outputs due to some internal implementation details. Take for example a semantic segmentation task with 3 classes: A, B, and C, each class has an associated mask Ma, Mb, Mc stored as a different file. Besides RLE encoding and similar sparse formats, the most basic ways to encode a dense mask, and augment a sample are:

Read Ma, Mb, and Mc as an np array and store them in a Python list, eg masks. The transform API allows to call transformed = transform(image=image, masks=masks) and gets the augmented image and mask pair.
Read Ma, Mb, and Mc as a np array and stack them in a mask np array of shape (H, W, C), where C=3 and each array's element is True or False. Let's refer to this as one-hot boolean encoding. The transform API allows to call transformed = transform(image=image, mask=mask) and gets the augmented image and mask pair.
Read Ma, Mb, and Mc as a np array and encode them in a mask array of shape (H, W), where each array's item represents the class index (0, 1, 2). Let's refer to this as integer tensor encoding. Then I can call transformed = transform(image=image, mask=mask) and get the augmented image and mask pair.

Now, my questions are:

Does Albumentations support all of the 3 types of encodings for every transform?
Does the encoding type affect the output of a given transformation?
Is one approach better than another in terms of performance?

albumentations-team / albumentations

Supported mask formats with Albumentations #1973

Your Question