Open pokey opened 1 year ago
I have a suggestion in a related vein. I think we can simplify the structure of the code here and do away with the random.uniform(0, 1) >= 0.9
. This is because the thing that's actually doing the augmentation --- the thing that ends up being called in turn by self.feature_engineering_augmented
--- is actually a probabilistic augmenter transform. That is, part of the code for augmented_feature_engineering
(which is what gets called by self.feature_engineering_augmented
) looks like this:
def augmented_feature_engineering( wavFile, settings ):
fs, rawWav = scipy.io.wavfile.read( wavFile )
wavData = rawWav
# <some stuff that I haven't included>
augmenter = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
wavData = augmenter(samples=np.array(wavData, dtype="float32"), sample_rate=fs)
The p
formal parameter for transforms like AddGaussianNoise
, TimeStretch
and Shift
is the probability that that transform will get applied (see, e.g., https://iver56.github.io/audiomentations/waveform_transforms/add_gaussian_noise/ and https://github.com/iver56/audiomentations/issues/168). So, what is currently happening is that
(I hope I haven't got the math wrong --- please correct me if I did.)
This structure of this code can thus be simplified as follows. Let $q$ be the probability that no augmentation will be done; this will be a hyperparameter that we control. And let $t$ be the number of augmenter transforms we're using (in the code above, this is 3). Since the augmenter transforms already come with a formal probability parameter $p$, we do not need the equivalent of random.uniform(0, 1) >= 0.9
. Instead, we can just set $p$ for the transforms based on the value we want for $q$ via $(1 - p)^{t} = q \iff p = 1 - \sqrt[t]{q}$, assuming we use the same $p$ for all the augmenter transforms. We can then treat $q$ as a hyperparameter that we can experiment with and tune (as per pokey's suggestion).
The only thing to keep in mind here is that for non-augmented datapoints, the feature values are cached, so I believe your approach might result in a performance penalty.
Tho tbh I'm guessing that doing more augmentation will outweigh the costs, but just a note
I'm not sure if this helps with making sure the caching works for the non-agumented data points, but we could also set p to 1 for the transforms, and adjust the amount of augmentation by keeping random.uniform(0, 1) >= some param
.
My main thought is just that stacking a probabilistic thing on top of another probabilistic thing makes things harder to reason about --- it would be clearer if we either removed the random.uniform stuff or made the transforms deterministic by setting p = 1.
Today, it appears that augmentation is only occurring 10% of the time
https://github.com/chaosparrot/parrot.py/blob/5b57d121a2283d0fd0ca66eaf6fdcb1620b3b5cd/lib/audio_dataset.py#L69-L72