facebookresearch / AugLy

A data augmentations library for audio, image, text, and video.
https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/
Other
4.97k stars 301 forks source link

SpecAugment? #100

Open picheny-nyu opened 3 years ago

picheny-nyu commented 3 years ago

🚀 Feature

Add SpecAugment as a form of audio augmentation.

Motivation

SpecAugment (https://arxiv.org/abs/1904.08779) has resulted in huge improvements in speech recognition performance over the last few years.

Pitch

Any serious audio augmentation toolkit should include SpecAugment as a type of audio augmentation. It has become extremely popular in speech recognition to the point where one wonders about the quality of a research paper that does not use this as standard processing. This, combined with speed and frequency perturbation, has become de rigeur in the speech recognition field. It should be an additional form of processing and accompanied by best practices in applying the technique as there are many variations.

Alternatives

People use time and frequency perturbations by themselves, but when you have a lot of training data, this methodology tends to wash out. SpecAugment improves results even with a lot of training data (at the expense of bigger models).

Additional context

You might also wish to include suggestions for how to integrate AugLy into popular speech recognition toolkits like Kaldi.

mthrok commented 3 years ago

note: If you are using PyTorch, then torchaudio has implementation of SpecAugment as TimeStretch, TimeMasking and FrequencyMasking.

picheny-nyu commented 3 years ago

For sure. But torchaudio also comes with other standard augmentation processes as well, in which case people may not wish to switch between torchaudio and AugLy........