aleju / imgaug

Image augmentation for machine learning experiments.
http://imgaug.readthedocs.io
MIT License
14.42k stars 2.44k forks source link

Repeating augmentations in multiprocessing.Pool #779

Open SNeugber opened 3 years ago

SNeugber commented 3 years ago

python:3.7.10 imgaug: 0.4.0 numpy: 1.20.3

I've found a curious issue when trying to run an existing imgaug augmentor with multiprocessing.Pool, where the augmentations seem to repeat in a cycle.

For example, with the below code the images will be either right way up (u) or inverted (i) repeating the pattern [u, i, i, i, u] 20 times, whereas the single-process version produces (pseudo-)random results.

Maybe worth sticking a warning in the documentation if this isn't something that can be "fixed"?

from pathlib import Path

import numpy as np
from PIL import Image
import imgaug.augmenters as iaa
from multiprocessing import Pool

class Runner:
    def __init__(self, output_dir: Path, seed: int):
        self.augs = iaa.Sequential([iaa.Flipud(p=0.5)])
        self.augs.seed_(seed)
        self.output_dir = output_dir
        self.output_dir.mkdir(exist_ok=True, parents=True)
        self.img = Image.open("./noop_image.jpg")

    def run_multiprocess(self):
        with Pool(5) as pool:
            paths = pool.map(self.augment_image, range(100))
        return paths

    def run_single_process(self):
        return [self.augment_image(i) for i in range(100)]

    def augment_image(self, i: int) -> Path:
        img_augmented = self.augs.augment_image(np.array(self.img))
        img_path = self.output_dir / f"img_{i}.png"
        Image.fromarray(img_augmented).save(img_path)
        return img_path

if __name__ == '__main__':
    runner = Runner(output_dir=Path("./runner_single_process"), seed=10)
    runner_mp = Runner(output_dir=Path("./runner_multiprocess"), seed=10)
    _ = runner.run_single_process()
    _ = runner_mp.run_multiprocess()