HawkAaron / E2E-ASR

PyTorch Implementations for End-to-End Automatic Speech Recognition
126 stars 27 forks source link

Two problems about training and decoding #6

Closed Marcovaldong closed 5 years ago

Marcovaldong commented 5 years ago

Hi, @HawkAaron I'm trying to train transducer with pytorch (I prefer to use it rather than MxNet) and I changed the code of this repo following another implementation of MxNet. However, I found the model cannot converge to a good result. Is there something wrong in my code?

Another problem is that I try to replace the code of here with while loop. However the model cannot get out of the while loop, is there something different of two implementations?

Marcovaldong commented 5 years ago

@HawkAaron I fixed the second problem, that's because of a bug.

The first one is still existing. Maybe it's also a bug. I'll check my code.

HawkAaron commented 5 years ago

@Marcovaldong You may compare the results between MxNet and PyTorch, it would be 2% absolute different if not carefully tuned.

Marcovaldong commented 5 years ago

@HawkAaron You means MxNet gets better result, is it?

Can I compare the loss between MxNet and PyTorch. Now, the Pytorch version cannot converge to the same loss as the MxNet version (the loss of PyTorch version can only converge to 9.1, however the loss of mxnet version can converge to 0.5), and the CER is too high (about 94.5%, the CER of MxNet version is about 1.41%).

I don't know what problem makes the model doesn't converge.

HawkAaron commented 5 years ago

@Marcovaldong Did you solve the problem ? The loss should be comparable, have you reproduce the results on TIMIT ?