SpecAugment is a structured dropout to be applied on MelSpectogram.
It masks some contiguous samples in the audio, as well as some continuous range of frequence.
It's often seen as an "augmentation" technique, but I think it could be implemented like dropout as a nn.Module,
and we could put one by default in the MelSpectrogram layer.
extract a function doing it on one dim: structured_droupout(x, dim, p, num_mask)
make the parameters more meaningful. It should be easier to compare a "time_mask_p" in SpecAugment with dropout "p".
try to make it faster by calling randint only once
Possible follow up: implement a very fast "spec augment like" that would mask the input with a regular pattern, using just reshape and slice assignment. Compare the speed with the previous implementation.
Note: this issue is for the MLH fellowship
SpecAugment is a structured dropout to be applied on MelSpectogram. It masks some contiguous samples in the audio, as well as some continuous range of frequence.
It's often seen as an "augmentation" technique, but I think it could be implemented like dropout as a nn.Module, and we could put one by default in the MelSpectrogram layer.
Fairseq as an implementation, but it's a bit naive: https://github.com/facebookresearch/fairseq/blob/1164a7fc432a188d401895018eaa85175fb06f9d/fairseq/data/audio/feature_transforms/specaugment.py#L13
I'd like to see a nicer version:
structured_droupout(x, dim, p, num_mask)
Possible follow up: implement a very fast "spec augment like" that would mask the input with a regular pattern, using just reshape and slice assignment. Compare the speed with the previous implementation.