gentaiscool / end2end-asr-pytorch

End-to-End Automatic Speech Recognition on PyTorch
MIT License
294 stars 62 forks source link

A problem about LibriSpeech's testing results #34

Open ssteven502tw opened 4 years ago

ssteven502tw commented 4 years ago

I have some question for you.

Whether the low-rank transformer model is not good for longer english sentence recognition (more than 30 words), I found that the WER is high, and the testing result is shown in the following:


Epoch 75 ,"Test_clean, WER=15.98%, CER=9.79%" ,"Test_other, WER=31.55%, CER=17.71%"

For example: hyp = "as the chase drives away mary stands bewildered and perplexed on the doorstep her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her"

gold = "as the chaise drives away mary stands bewildered and perplexed on the door step her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her mother disappointment vexation and ill humor surge and swell among those delicate organizations on which the structure and development of the soul so closely depend doing perhaps an irreparable injury"


Later sequences are not recognized, is there any way to improve it?

Thanks

gentaiscool commented 4 years ago

Hi @ssteven502tw

There are several possible ways to improve the performance. First, you should check whether you cut the audio during the preprocessing. Since the sequence is long, probably you accidentally limit the audio of the training set. Second, you should also check the maximum sequence length param in the training.