Closed ismorphism closed 7 years ago
Hi @morphism90, I suspect this is due to the way you recorded/converted your .wav and .lac files. Did you use any sound editor to do so eg. Audacity to take care of the sample rate and the extension? Also if I remember correctly, using the same file extension as the one used in FormatLibriSpeech.lua helped.
Th weirdest thing that I used for model testing default .flac files from test-clean, train-clean directories. None of them was recognized succesfully.
Ah yes this sounds weird. Could you print the error logs you get when doing this if you can reproduce them maybe? You run the tests with Predict.lua right?
Yes, I used Predict.lua. Do you mean error logs with CER and WER values?
@morphism90 ok I read too quickly, the run indeed succeeded but the transcription didn't :-) So if you used the same extension and sample rate for Train.lua and Predict.lua I don't really see what you did wrong here...
For prediction I used following command:
th Predict.lua -modelPath deepspeech.torch/models/model_epoch_deepspeech.t7 -audioPath sample.wav
and if in original audio track we hear --> "I love you". It recognized it as "y o". Looks bad. Other tests show the same situation.
Also, I see that when I used AN4 dataset
I got good results on test and train sets in comparison to LibriSpeech dataset
. Maybe, the key difference is that in AN4 case we used in training an4.dic
but not default ./dictionary
folder?
It shouldn't be an issue, I used the default ./dictionary
and it was fine.
When you say
For prediction I used following command: th Predict.lua -modelPath deepspeech.torch/models/model_epoch_deepspeech.t7 -audioPath sample.wav
was there also transcription problems with .flac files instead of .wav?
There are no problems with transcription
Ok, so I think that if you trained your model with -audioExtension flac
in FormatLibriSpeech.lua you also need to use .flac files when testing your model's performances. It seems that file extensions are critical here and they need to correspond for training and testing.
Yeah, I agree with you but my model doesn't even work for training examples in .flac format.
@yfletberliac thanks for your help so far :)
So just to clear a few things, it seems like from the log the model did train okish, my questions are:
What audio are you giving the model currently to predict on? Is the audio in the same format that the audio given to the model is in (16 bit, 16khz audio) for training?
Yes. I used http://www.online-convert.com/ for online transfer to desired .flac format
@morphism90 I don't think file extensions are critical because when you use audio.spectrogram(...) you convert your speech file to (frequency * time) matrix . I trained the model on .flac files and I have also tested on .wav format files , I never faced this issue.
What I would recommend is just loading the model in torch terminal and doing a forward pass on sample.wav or sample.flac and check their output.
Also, did you try SoX for doing the conversion of audio?
Ok, I tried SoX for conversion. Maybe I think there is a problem with datasets structure. When I trained on AN4 it shows fine recognition results on training datasets. But in the case when I used train-clean-100.tar.gz
as libri_datasets/train and dev-clean
and test-clean
I got above mentioned issue.
Did you happen to train AN4 first? If that's the case your problem might be related to #78 .
Thank you a lot @markmuir87 ! It helps and now my prog runs in appropriate way.
Hi all! I trained Deep speech model with this command:
th Train.lua -nGPU 1 -epochSave -batchSize 25 -validationBatchSize 25 -permuteBatch -trainingSetLMDBPath/prepare_datasets/libri_lmdb/train/ -validationSetLMDBPath /prepare_datasets/libri_lmdb/test/ -modelTrainingPath /model_test/ -saveFileName NewOne.t7 -epochs 100
As I remember correctly this is pure DeepSpeech model. I have got following results:
Training Epoch: 83 Average Loss: 0.004611 Average Validation WER: 17.13 Average Validation CER: 3.96
I used for training path the file
test-clean-100.tar.gz
and for validation path I used the union oftest-clean.tar.gz
anddev-clean.tar.gz
. But when I tried to recognize any .wav or .lac files (including training/testing files ) I got some meaningless trash. It's very strange because of above mentioned learning results. Did anyone have similiar problem? Could someone explain what's wrong?