Open picheny-nyu opened 3 years ago
note: If you are using PyTorch, then torchaudio has implementation of SpecAugment as TimeStretch
, TimeMasking
and FrequencyMasking
.
For sure. But torchaudio also comes with other standard augmentation processes as well, in which case people may not wish to switch between torchaudio and AugLy........
🚀 Feature
Add SpecAugment as a form of audio augmentation.
Motivation
SpecAugment (https://arxiv.org/abs/1904.08779) has resulted in huge improvements in speech recognition performance over the last few years.
Pitch
Any serious audio augmentation toolkit should include SpecAugment as a type of audio augmentation. It has become extremely popular in speech recognition to the point where one wonders about the quality of a research paper that does not use this as standard processing. This, combined with speed and frequency perturbation, has become de rigeur in the speech recognition field. It should be an additional form of processing and accompanied by best practices in applying the technique as there are many variations.
Alternatives
People use time and frequency perturbations by themselves, but when you have a lot of training data, this methodology tends to wash out. SpecAugment improves results even with a lot of training data (at the expense of bigger models).
Additional context
You might also wish to include suggestions for how to integrate AugLy into popular speech recognition toolkits like Kaldi.