SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
259 stars 73 forks source link

Getting wrong prediction results on pretrained librispeech model #84

Closed saurabhvyas closed 7 years ago

saurabhvyas commented 7 years ago

Here is what the log file says after running test Average WER = 10.65 | CER = 3.19

But when I tested on real data ( .wav file 16 bit 16Khz mono not too long in length ) even on simple recordings which say "Hello World" and "red color" etc I am getting non sensical results , I am getting either blanks or no where related to original speech , Also I observed that its biased towards numbers , I confirmed this by recording a .wav file in which I said "five three two" and it predicted it right , but other than number , the results are terrible , any idea what might be happening here ?

SeanNaren commented 7 years ago

I'm not sure if the librispeech model has been trained on enough hours to get sensical responses on real test data. To put in comparison Baidu required 11K hours to train a model that was sufficient for their use cases.

Internally we've ran around 4k hours of audio mixing internal as well as openly available sources and have gotten acceptable results. I'd suggest merging multiple available sources online and training the model for more epochs on more data!

saurabhvyas commented 7 years ago

Alright , will add more data and try again , Thanks

byuns9334 commented 7 years ago

@saurabhvyas which command did you write to test on pretrained model? "th Test.lua -loadPath libri_deepspeech.t7 -trainingSetLMDBPath prepare_datasets/libri_lmdb/train -validationSetLMDBPath prepare_datasets/libri_lmdb/test" something like this?

saurabhvyas commented 7 years ago

I tried this months ago, and havn't used it again, so I am not sure