facebookresearch / AugLy

A data augmentations library for audio, image, text, and video.
https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/
Other
4.94k stars 298 forks source link

Is it possible to seed augmentations #119

Open ierezell opened 2 years ago

ierezell commented 2 years ago

🚀 Feature

Checked the code and it doesn't seem possible to seed the augmenters.

Motivation

To reproduce the same augmented test sets. Also to (deterministically) augment small test sets.

Pitch

Adding the possibility to seed augmenters like My_Augmenter(**my_params, seed=42) which would make reproducible augmentations. Not always the same for all the batch but always in the same order in a reproducible manner.

Alternatives

Don't use augmentation in the test set.

Additional context

I'm working with text, and I would like to augment a test set but the idea also apply to images and audio.

Thanks in advance and thanks a lot for this nice library !

zpapakipos commented 2 years ago

Hi @Ierezell, thanks for this request! We currently have a seed arg for some augmentations that include random sampling within them (e.g. change_case in text https://github.com/facebookresearch/AugLy/blob/main/augly/text/functional.py#L65 or perspective_transform in image https://github.com/facebookresearch/AugLy/blob/main/augly/image/functional.py#L1719), but it's true that this is missing for some (e.g. insert_punctuation_chars in text https://github.com/facebookresearch/AugLy/blob/main/augly/text/functional.py#L144).

I will add this to our list of tasks to do soon, to add seed args to all augmentations that involve random sampling which don't already have it.

sciencecw commented 7 months ago

It is good that change_case has seed argument, but the current implementation means that without specifying the seed, it keeps generating the same result

javiabellan commented 7 months ago

AugLy is cool, however many transformations are not random:

Here is a code snippet to make any transfomation random, (even with the random distributions you preffer):

import numpy as np
import augly.image as imaugs

def random_uniform(min,max):
    return lambda: np.random.uniform(min, max)

def random_normal(mean,std):
    return lambda: np.random.normal(mean, std)

def randomize(fn_2_randomize, **kwargs):
    def randomized_fn(img_pil):
        # 1) Compute random_params into fixed_params:
        fixed_args = {}
        for key_paramName, value_randomFn in kwargs.items():
            fixed_args[key_paramName] = value_randomFn()
        # 2) Call PIL function with fixed_params
        return fn_2_randomize(img_pil, **fixed_args)
    return randomized_fn

random_overlay_text = randomize(imaugs.overlay_text,
                                x_pos=random_uniform(0,0.5),
                                y_pos=random_uniform(0,0.9))

# Now you can call it :)
random_overlay_text(img)