Enhance augmentation objects with references to a random state.

Erotemic commented 2 months ago

Suggested Improvement

Looking at the current code, to draw random samples augmentation objects are using the global random state in the random module. This is ideal for maximally random pipelines that are impacted by any other use of the global random state outside of albumentations itself.

This is not ideal for cases where the subcomponents of a system want their random generators to be seeded and not impacted by other components. For instance, right now there is no way for me to define a seeded augmentation pipeline that does not interfere with any other usage of the global random state.

I suggest adding a parameter to each augmentation class called: seed, random_state, or rng that defaults to None. When it is None, the it gets resolved to the global random state, which keeps the current behavior.

If it is an integer, then it would create a new random.Random object, and if rng is already a random.Random object, then it keeps it as-is, which allows augmentation pipelines to be independent of the global random state, but use an internally consistent random state.

Potential Benefits

Default behavior is unchanged
Makes it easy to test augmentation pipelines without modifying the global state
Makes it possible to set up a highly random, but consistent augmentation pipeline independent of any global random usage.

Additional Information

This is how the (now defuct) imgaug library handled randomness, where random states are explicitly passed and maintained.

I see there is a random_utils module which somewhat handles this, but only for numpy random states, but as documented in CONTRIBUTING, it is only to ensure that any numpy.random usage is impacting the global Python random state.

I've written a function that I widely use called ensure_rng that handles the resolution of an argument to a valid random state object. In fact, it can also convert between the stdlib random.Random and np.random.RandomState objects. This might be useful here, although it doesn't exactly handle what is done in random_utils.get_random_state, but it is compatible with it.

I also see that ReplayCompose is a good solution to the problem of creating reproducible pipelines, but I believe maintaining a random state in each augmentation instance is complementary, especially in the realm of testing.

ternaus commented 2 months ago

Thanks. Makes sense. Let me think about it.

ternaus commented 1 week ago

@Erotemic

You may define random state for numpy random per transform and in Compose as:

aug.set_random_state(0)

albumentations-team / albumentations