descriptinc / audiotools

Object-oriented handling of audio data, with GPU-powered augmentations, and more.
https://descriptinc.github.io/audiotools/
MIT License
233 stars 39 forks source link

Shuffle transform #46

Closed oreillyp closed 1 year ago

oreillyp commented 2 years ago

I've gone ahead and taken a stab at the Shuffle transform. Here are a few quick notes, organized as follows:

  1. Meta-transforms and prob
  2. Why not a simple context manager?
  3. Avoiding trivial permutations
  4. Performance
  5. Argument-checking
  6. Black

1. Meta-transforms and prob

For certain "meta"-transforms like shuffling, which operate on other (composed) transforms rather than just the audio, we may want to tweak how we treat the top-level prob argument. Currently, for transforms like Choose, it serves as a single on/off switch for all composed transforms due to the way masking is handled in BaseTransform.transform(). For something like shuffling, we probably want prob to instead control the probability that the order of composed transforms is shuffled (i.e. the probability that the meta-transform is applied to composed transforms). The way I've gone about this is to override BaseTransform.transform(), treating true/false mask values separately and using a context manager to temporarily re-order composed transforms. Hypothetically, if a user wants to retain a single on/off switch, they can simply put a Shuffle object inside another Compose object.

2. Why not a simple context manager?

Following Choose, Shuffle converts one or more given states to a single seed, which determines the shuffled order of composed transforms rather than the selection of a single transform. While we had discussed implementing shuffling as a simple context manager on the Compose class (a la filter) rather than an independent transform, that runs into the issue of seeding the shuffle operation. Choose allows users to seed a random selection operation in _instantiate() by aggregating any provided random states and storing the result as a seed argument. We could do the same by taking state / states arguments in the context manager and handling aggregation there, although that might get a little messy and would require adding a max_seed constructor argument to Compose. I don't have super strong feelings on this, but for now having Shuffle as a transform suits my needs fine.

3. Avoiding trivial permutations

Randomly shuffling the order of composed transforms can easily lead to trivial permutations, especially when the number of transforms is small. To avoid this, I'm just looping until a non-trivial permutation is obtained. Because the same starting seed / state should yield the same sequence of candidate permutations, the shuffle operation remains reproducible. For now I'm using Python's list comparison, as the number of transforms will generally be fairly small and this operation appears to be at least somewhat optimized.

4. Performance

Anecdotally, Shuffle appears to run similarly across settings of prob. This means it is comparable to Compose when prob=1.0, roughly twice as slow at prob=0.5, and far slower at prob=0.0. This is because the prob argument affects the order in which Shuffle applies composed transforms, not whether it applies them (unlike Compose, Choose, etc.). Thus, no computation is saved at lower probabilities.

5. Argument-checking

While writing the transform, I lost a lot of time due to a dumb mistake in which I passed a transform an argument in the form ('const', [val]) instead of ('const', val). This actually works or fails silently for a number of transforms, and only occasionally gets caught in the form of something opaque like an AudioSignal dimension error. As a placeholder/reminder for a more thorough fix, I've added the function util.verify_dist_tuple() with a call inside util.sample_from_dist().

6. Black

I updated black in the pre-commit config file, as the old version seems to be the cause of a common issue with click.

I still need to add a bunch of tests, so no worries if this PR sits on the shelf for a while.