chaosparrot / parrot.py

Computer interaction using audio and speechrecognition
MIT License
139 stars 36 forks source link

Experiment with augmenting a higher percentage of the dataset #13

Open pokey opened 1 year ago

pokey commented 1 year ago

Today, it appears that augmentation is only occurring 10% of the time

https://github.com/chaosparrot/parrot.py/blob/5b57d121a2283d0fd0ca66eaf6fdcb1620b3b5cd/lib/audio_dataset.py#L69-L72

ym-han commented 1 year ago

I have a suggestion in a related vein. I think we can simplify the structure of the code here and do away with the random.uniform(0, 1) >= 0.9. This is because the thing that's actually doing the augmentation --- the thing that ends up being called in turn by self.feature_engineering_augmented --- is actually a probabilistic augmenter transform. That is, part of the code for augmented_feature_engineering (which is what gets called by self.feature_engineering_augmented) looks like this:

def augmented_feature_engineering( wavFile, settings ):
    fs, rawWav = scipy.io.wavfile.read( wavFile )
    wavData = rawWav
   # <some stuff that I haven't included>

    augmenter = Compose([
        AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
        TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
        Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
    ])
    wavData = augmenter(samples=np.array(wavData, dtype="float32"), sample_rate=fs)

The p formal parameter for transforms like AddGaussianNoise, TimeStretch and Shift is the probability that that transform will get applied (see, e.g., https://iver56.github.io/audiomentations/waveform_transforms/add_gaussian_noise/ and https://github.com/iver56/audiomentations/issues/168). So, what is currently happening is that

(I hope I haven't got the math wrong --- please correct me if I did.)

This structure of this code can thus be simplified as follows. Let $q$ be the probability that no augmentation will be done; this will be a hyperparameter that we control. And let $t$ be the number of augmenter transforms we're using (in the code above, this is 3). Since the augmenter transforms already come with a formal probability parameter $p$, we do not need the equivalent of random.uniform(0, 1) >= 0.9. Instead, we can just set $p$ for the transforms based on the value we want for $q$ via $(1 - p)^{t} = q \iff p = 1 - \sqrt[t]{q}$, assuming we use the same $p$ for all the augmenter transforms. We can then treat $q$ as a hyperparameter that we can experiment with and tune (as per pokey's suggestion).

pokey commented 1 year ago

The only thing to keep in mind here is that for non-augmented datapoints, the feature values are cached, so I believe your approach might result in a performance penalty.

Tho tbh I'm guessing that doing more augmentation will outweigh the costs, but just a note

ym-han commented 1 year ago

I'm not sure if this helps with making sure the caching works for the non-agumented data points, but we could also set p to 1 for the transforms, and adjust the amount of augmentation by keeping random.uniform(0, 1) >= some param.

My main thought is just that stacking a probabilistic thing on top of another probabilistic thing makes things harder to reason about --- it would be clearer if we either removed the random.uniform stuff or made the transforms deterministic by setting p = 1.