SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
259 stars 73 forks source link

Always getting the same predictions #79

Closed leandrosnm closed 7 years ago

leandrosnm commented 7 years ago

Hi,

I'm trying to run Predict.lua on the pretrained models in the repository and I'm having the same issue with all of them. I'm using as input a file from LibriSpeech in flac or wav format. When I run Predict.lua I print the predictions and each column has the same value in all the rows.

I've checked that the audio is sampled at 16kHz and spectrogram seems ok for both wav and flac formats.

Any idea of what might be happening??

Thanks!

SeanNaren commented 7 years ago

Any progress on the issue? This is really strange. I'll double check the models but they should be ok

leandrosnm commented 7 years ago

Hi, I haven't been able to work on this any further. My guess is that something might be wrong with Torch set up. I'll check it again and post an update as soon as I can.

Thanks!

saurabhvyas commented 7 years ago

I am curious how do you use predict.lua file , I mean how do you generate .wav file for prediction , I generate mine using html5 audio recording api and when I tried predicting it , I got an error " Multi-channel stft not supported at /tmp/luarocks_audio-0.1-0-9389/lua---audio/generic/audio.c:88 "

saurabhvyas commented 7 years ago

never mind got my answer , I basically need to convert it to single channel (mono ) , I am using online audio to .wav converter