There is a problem about training a RNN-T model?

scufan1990 commented 2 years ago

Hi, There is a problem about training a conformer+RNN-T model. How about the cer and wer with one GPU?

I'm train the model on one RTX TITAN GPU, training the conformer(encoder layers 16, encoder dim 144, decoder layer 1, decoder dim 320) on Librispeech 960h., after 50 epoch training the CER is about 27 and don't reduce anymore.

could you tell me why？

aheba commented 2 years ago

Hello, We have a similar implementation within speechbrain (implemented with Python-Numba), you can take a look https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transducer transducer implementation: https://github.com/speechbrain/speechbrain/blob/develop/speechbrain/nnet/loss/transducer_loss.py

otherwise, you have the torchaudio supporting this lib within torchaudio 1.0, check: https://pytorch.org/audio/stable/functional.html#rnnt-loss

we are working on supporting the torchaudio within speechbrain as well see: https://github.com/speechbrain/speechbrain/pull/1199

flp1990 commented 2 years ago

Hi, thank you! I will try it later.

csukuangfj commented 2 years ago

There is also an implementation at https://github.com/csukuangfj/optimized_transducer that uses less GPU memory.

yufang67 commented 2 years ago

Hi @scufan1990 , did you resolve the issue ? i have some training in which the rnnt loss stop reducing.

HawkAaron / warp-transducer

There is a problem about training a RNN-T model? #96