flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

how to make model streaming conv more generalize better? #891

Closed trangtv57 closed 3 years ago

trangtv57 commented 3 years ago

thanks you for amazing repo. I have trained success model streaming conv from your pretrained and it's have good WER, but in some case I have checked model seem be overfitting, it's so confidence so inject LM can't be improve anything. My model training with just spec augment, and I know some augment data with kaldi, but it's generate data physical, not on fly and i have problem with store on server. another solution i try is increase label smoothing but it's not take effect much. So Can you suggest me some solution to make model generalize, may be how to convert augment data from kaldi like spec augment, where i can start. Or some another keyword for do something like this. Thanks you

padentomasello commented 3 years ago

Hi @trangtv57, there are some parameters to spec augment that worth be tuning -- please see: https://github.com/facebookresearch/flashlight/blob/43c59aa41bdc6293ba97f4ea4677f886de8bc458/flashlight/app/asr/common/Defines.h#L199-L203 and the paper for an explanation of the parameters: https://arxiv.org/pdf/1904.08779.pdf.

Additionally, it might be worth trying to increase weight decay which can help with overfitting as well.

Hope this helps!

trangtv57 commented 3 years ago

thanks @padentomasello I already trying to tuning spec augment parameter, and some experiment with weight decay param. Any idea about augmentation for make audio as in real life noise like kaldi but on the fly, because my audio have noises like this, babble, musan or music. And my engine is has some failed in this case.

tlikhomanenko commented 3 years ago

We are doing a bit code release on augmentation, please check in a couple of weeks here https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr/augmentation

trangtv57 commented 3 years ago

tks @tlikhomanenko i know this code about augmentation for asr when tracing new version of flashlight, but i don't think it will really help for simulation audio in real life. anyway thanks for your answer, i will try another solution.