Closed saurabhvyas closed 7 years ago
I'm not sure if the librispeech model has been trained on enough hours to get sensical responses on real test data. To put in comparison Baidu required 11K hours to train a model that was sufficient for their use cases.
Internally we've ran around 4k hours of audio mixing internal as well as openly available sources and have gotten acceptable results. I'd suggest merging multiple available sources online and training the model for more epochs on more data!
Alright , will add more data and try again , Thanks
@saurabhvyas which command did you write to test on pretrained model? "th Test.lua -loadPath libri_deepspeech.t7 -trainingSetLMDBPath prepare_datasets/libri_lmdb/train -validationSetLMDBPath prepare_datasets/libri_lmdb/test" something like this?
I tried this months ago, and havn't used it again, so I am not sure
Here is what the log file says after running test Average WER = 10.65 | CER = 3.19
But when I tested on real data ( .wav file 16 bit 16Khz mono not too long in length ) even on simple recordings which say "Hello World" and "red color" etc I am getting non sensical results , I am getting either blanks or no where related to original speech , Also I observed that its biased towards numbers , I confirmed this by recording a .wav file in which I said "five three two" and it predicted it right , but other than number , the results are terrible , any idea what might be happening here ?