TensorSpeech / TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
https://huylenguyen.com/asr
Apache License 2.0
916 stars 242 forks source link

rnn_transducer #231

Open arthur-compton opened 2 years ago

arthur-compton commented 2 years ago

I've been running the examples in the "conformer/" and in the "rnn_transducer/" directories and comparing the models with those already provided on drive.

The conformer training works as expected, and the results of the model I trained are almost identical to the results obtained with the pretrained model (I am using the three librispeech training sets for training, 960 hours).

The training in the rnn_transducer example, however, doesn't really converge to anything usable. I've tried with the configuration in the codebase and with the slightly different configuration in drive. In both cases the loss reduces just a little bit during training but certainly too little, so that the final model has not learnt much.

My guess is that there is something broken in the rnn_transducer example. Has anyone tried it out with a recent version of the code? I've tried version 1.0.3 (TF2.6), 1.0.1, and 1.0.0 (TF2.4.1): in all cases the training doesn't really converge.

Any suggestion is very much appreciated!

maxeduc commented 1 year ago

Just tried this repo, and agreed. It seems either the wrong config file was uploaded or there's a regression in the repo (or Tensorflow). Any tips on what's happening here would be greatly appreciated.

tensorspeech-loss

yiqiaoc11 commented 1 year ago

Same Issue here. Adam with warmstep-40000 didn't learn anything. Can we @usimarit take a look at the code?