facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.59k stars 6.41k forks source link

Refactor/Improve SpecAugment #4970

Open gwenzek opened 1 year ago

gwenzek commented 1 year ago

Note: this issue is for the MLH fellowship

SpecAugment is a structured dropout to be applied on MelSpectogram. It masks some contiguous samples in the audio, as well as some continuous range of frequence.

It's often seen as an "augmentation" technique, but I think it could be implemented like dropout as a nn.Module, and we could put one by default in the MelSpectrogram layer.

Fairseq as an implementation, but it's a bit naive: https://github.com/facebookresearch/fairseq/blob/1164a7fc432a188d401895018eaa85175fb06f9d/fairseq/data/audio/feature_transforms/specaugment.py#L13

I'd like to see a nicer version:

Possible follow up: implement a very fast "spec augment like" that would mask the input with a regular pattern, using just reshape and slice assignment. Compare the speed with the previous implementation.

NTR0314 commented 1 year ago

Not directly related to this question, but is it possible to use the SpecAugment technique for evaluating a model with 'fairseq-generate'?

If it is, what is the correct way to do it?