Closed aheba closed 2 years ago
AFAIK, normalizing to length doesn't make sense. RNN-T loss treats a single sequence as a single input object. In practice, it just scale your gradient values. In order to compare the normalized loss you need to scale the learning rate as well. Finally, you can specify this parameter if you like.
Hello,
We saw that your implementation doesn't normalize the loss with the input seq length, Here is an example of the training on TIMIT corpus:
RNNT loss torchaudio:
RNNT loss spbrain: