SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
260 stars 73 forks source link

How to just evaluate a pre-trained network on an audio file? #87

Open devinbostIL opened 7 years ago

devinbostIL commented 7 years ago

Hi,

I was able to get my environment setup, and I am wanting to just try evaluating an existing model (such as the LibriSpeech network) to attempt speech-to-text on an audio file. I just want to perform the transcription. How do I go about this with your library? I am not sure from the documentation what steps are necessary and how much extra development work I will need to do (if any) to perform the transcription task from your library.

SeanNaren commented 7 years ago

Hey my bad! Should update the docs sometime :) To do this use the predict script like below:

th Predict.lua -modelPath /path/to/model.t7 -audioPath /path/to/audio.wav

There are further parameters if you need them, use the -help argument to see them!

devinbostIL commented 7 years ago

Thanks for the information!

I attempted to run the model, and it blew up with this message:

$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/nameOfAudioFile.wav'
/home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 2 module of nn.Sequential:
In 3 module of nn.Sequential:
In 1 module of cudnn.BatchBRNNReLU:
/home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (5107x1x1x1760) and desired view (5107x-1) do not match
stack traceback:
    [C]: in function 'error'
    /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize'
    /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77>
    [C]: in function 'xpcall'
    ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41>
    [C]: in function 'xpcall'
    ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    Predict.lua:42: in main chunk
    [C]: in function 'dofile'
    ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    Predict.lua:42: in main chunk
    [C]: in function 'dofile'
    ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x00405d50

Any ideas?

devinbostIL commented 7 years ago

Is it expecting me to pass it a table or a directory with a collection of audio files?

devinbostIL commented 7 years ago

I tried changing the file and then also the sampling rate, and these were the error messages that I got:

~/src/deepspeech.torch$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/4402691.wav' /home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: In 2 module of nn.Sequential: In 3 module of nn.Sequential: In 1 module of cudnn.BatchBRNNReLU: /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (3951x1x1x1760) and desired view (3951x-1) do not match stack traceback: [C]: in function 'error' /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize' /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77> [C]: in function 'xpcall' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41> [C]: in function 'xpcall' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41> [C]: in function 'xpcall' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' Predict.lua:42: in main chunk [C]: in function 'dofile' ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' Predict.lua:42: in main chunk [C]: in function 'dofile' ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

~/src/deepspeech.torch$ th Predict.lua -modelPath libri_deepspeech.t7 -audioPath '/home/devinbost/Downloads/speech_audio_files_sample/4402691.wav' -sampleRate 13000 /home/devinbost/torch/install/bin/luajit: ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: In 1 module of nn.Sequential: In 7 module of nn.Sequential: /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: input view (1x32x26x4864) and desired view (1312x-1) do not match stack traceback: [C]: in function 'error' /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:47: in function 'batchsize' /home/devinbost/torch/install/share/lua/5.1/nn/View.lua:79: in function </home/devinbost/torch/install/share/lua/5.1/nn/View.lua:77> [C]: in function 'xpcall' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function <.../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:41> [C]: in function 'xpcall' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' Predict.lua:42: in main chunk [C]: in function 'dofile' ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above. stack traceback: [C]: in function 'error' ...e/devinbost/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors' .../devinbost/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward' Predict.lua:42: in main chunk [C]: in function 'dofile' ...bost/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

SeanNaren commented 7 years ago

Make sure the file is a 16khz wav file, is this the case?

I've also added documentation here.

sirmick commented 7 years ago

I'm having the same problem. I downloaded the LibriSpeech pre trained model, am launching with th Predict.lua -modelPath libri_deepspeech.t7 -audioPath amy.out.wav -dictionaryPath ./dictionary -nGPU 1

I'm trying to run this against a WAV file I downsampled to 16k mono with sox amy.wav amy.out.wav rate 16k channels 1. It is a 16bit file, if that counts for anything.

I'm getting a very similar error when i try to run predict, View.lua:47: input view (241x1x1x1760) and desired view (241x-1) do not match

If I figure out what I'm doing wrong, I'd be happy to contribute some better documentation or strengthen the input file checking in Predict.lua so it throws actionable errors.