Closed vsl9 closed 5 years ago
The original SpecAugment paper uses much higher values for parameters: width_freq_mask = 27
(number of filter banks they use is 80; jasper uses 6 with 64 filter banks) and width_time_mask = 100
(jasper uses 6)
The Google's SpecAugment code https://github.com/tensorflow/lingvo/blob/master/lingvo/core/spectrum_augmenter.py#L37-L42 uses less aggressive values: width_freq_mask = 10
and width_time_mask = 50
Could you please comment if you tried more aggressive values (compared to 6)? Thanks!
Hmm, though Google's SpecAugment code seems to use freq_mask_count = 1
and time_mask_count = 1
(while jasper uses 2), so all in all the difference is probably not very significant
Added time and frequency masks similar to SpecAugment (https://arxiv.org/abs/1904.08779)