NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 371 forks source link

SpecAugment-like time and frequency masks #451

Closed vsl9 closed 5 years ago

vsl9 commented 5 years ago

Added time and frequency masks similar to SpecAugment (https://arxiv.org/abs/1904.08779)

vadimkantorov commented 4 years ago

The original SpecAugment paper uses much higher values for parameters: width_freq_mask = 27 (number of filter banks they use is 80; jasper uses 6 with 64 filter banks) and width_time_mask = 100 (jasper uses 6)

The Google's SpecAugment code https://github.com/tensorflow/lingvo/blob/master/lingvo/core/spectrum_augmenter.py#L37-L42 uses less aggressive values: width_freq_mask = 10 and width_time_mask = 50

Could you please comment if you tried more aggressive values (compared to 6)? Thanks!

vadimkantorov commented 4 years ago

Hmm, though Google's SpecAugment code seems to use freq_mask_count = 1 and time_mask_count = 1 (while jasper uses 2), so all in all the difference is probably not very significant