[TensorFlow] CoarseDropout produces black stripes in random positions on the image

roma-glushko commented 3 years ago

🐛 Bug

I'm using albumentations with TF-GPU 2.5 and in this setup CoarseDropout augmentation produces black stripes on my training images for some reasons:

Here is how my data loading happens:

def augment_image(inputs, labels, augmentation_pipeline: a.Compose, seed: int = 42):
    def apply_augmentation(images):
        random.seed(seed)
        np.random.seed(seed)

        aug_data = augmentation_pipeline(image=images.astype('uint8'))

        return aug_data['image']

    inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)

    return inputs, labels

def get_dataset(
        dataset_path: str,
        subset_type: str,
        augmentation_pipeline: a.Compose,
        validation_fraction: float = 0.2,
        batch_size: int = 32,
        image_size: Tuple[int, int] = (300, 300),
        seed: int = 42
) -> tf.data.Dataset:
    augmentation_func = partial(
        augment_image,
        augmentation_pipeline=augmentation_pipeline,
        seed=seed,
    )

    dataset = image_dataset_from_directory(
        dataset_path,
        subset=subset_type,
        class_names=class_names,
        validation_split=validation_fraction,
        image_size=image_size,
        batch_size=batch_size,
        seed=seed,
    )

    return dataset \
        .map(augmentation_func, num_parallel_calls=AUTOTUNE) \
        .prefetch(AUTOTUNE)

Then I run the following snippet and get the strips on my examples:

train_dataset = get_dataset(
    config.train_dataset_path,
    'training',
    config.train_augmentation,
    validation_fraction=0.2,
    batch_size=config.batch_size,
    image_size=config.image_size,
    seed=config.seed,
)

plt.figure(figsize=(10, 10))

for image_batch, _ in train_dataset.take(1):
    for idx in range(9):
        image = image_batch[idx].numpy().astype('uint8')

        ax = plt.subplot(3, 3, idx + 1)
        plt.imshow(image)
        plt.axis('off')

The stripes go away when I comment CoarseDropout augmentation in my augmentation pipeline which looks like this during:

args['train_augmentation'] = a.Compose([
    a.VerticalFlip(),
    a.HorizontalFlip(),
    a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False),
    a.CoarseDropout(p=0.0, max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8),
    a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)),
])

To Reproduce

Steps to reproduce the behavior:

Clone the project state at 0.1.0-bugrep tag:

git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor

Pull dataset:

cd data
kaggle datasets download --unzip frtgnn/rock-paper-scissor

Install project deps:
```
poetry install
```
Make sure CoarseDropout augmentation is always on in the config file: https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py
Run a notebook https://github.com/roma-glushko/rock-paper-scissor/blob/master/notebooks/data_augmentation.ipynb

Expected behavior

I always expect to see CoarseDropouts as a rectangles of the defined size:

Environment

Albumentations version (e.g., 0.1.8): 0.5.2
Python version (e.g., 3.7): 3.8.6
OS (e.g., Linux): Ubuntu 20.10
How you installed albumentations (conda, pip, source): poetry (pip-like)
tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))

Dipet commented 3 years ago

I can not reproduce the problem.

import albumentations as a
import cv2 as cv
import matplotlib.pyplot as plt

augs = a.Compose([
    a.VerticalFlip(p=1),
    a.HorizontalFlip(p=1),
    a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False, p=1),
    a.CoarseDropout(p=1.0, max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8),
    a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)),
])

img = cv.imread("/home/dipet/Pictures/paper01-000.png")
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)

plt.subplot(211)
plt.imshow(img, vmin=0, vmax=255)
plt.subplot(212)
plt.imshow(augs(image=img)["image"], vmin=0, vmax=255)
plt.show()

Could you give random seed to reproduce the problem? Or give dump of applied args from ReplayCompose and images for these arguments

roma-glushko commented 3 years ago

@Dipet thank you for replay!

It makes sense to me that the snippet above did not help to reproduce the issue. I feel like image_dataset_from_directory() has something to do with the issue (may be related to https://github.com/albumentations-team/albumentations/issues/905), so there would be more odds to reproduce it if you try to load images in the same way as me.

In any case, here is an achieve with ablumentations state after running:

for image_batch, _ in train_dataset.take(1):
    for idx in range(9):
        image = image_batch[idx].numpy().astype('uint8')

        ax = plt.subplot(3, 3, idx + 1)
        plt.imshow(image)
        plt.axis('off')

https://drive.google.com/file/d/13Tf-iFM7hjBqH7jntys3SqFHQdE0BDPG/view?usp=sharing

Dipet commented 3 years ago

It looks like you are trying to apply augmentation to batch of images. Try to change line

aug_data = augmentation_pipeline(image=images.astype('uint8'))

to:

res_images = []
for img in images:
    aug_data = augmentation_pipeline(image=img.astype('uint8'))
    res_images.append(aug_data["image"])
return np.stack(res_images)

roma-glushko commented 3 years ago

@Dipet yes, seems like the reason of the issue. Thank you for the help 👍

albumentations-team / albumentations