SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
260 stars 73 forks source link

deepspeech.torch

Build Status Documentation Status

Implementation of Baidu Warp-CTC using torch7. Creates a network based on the DeepSpeech2 architecture using the Torch7 library, trained with the CTC activation function.

Features

Branches

There are currently two branches, Master and Phoneme:

Installation/Data Preparation/Documentation

Follow Instructions/Data Preparation/Documentation found in the wiki here to set up and run the code.

Technical documentation can be found here.

Pre-trained Networks

Pre-trained networks are available for AN4 as well as LibriSpeech for CUDA only (since they use cudnn RNNs). Download Links and accuracies are below. DeepSpeech-light is a smaller model which is less intensive to train (based on LSTMs rather than RNNs).

AN4

an4Test

Network WER CER Link
DeepSpeech-light N/A N/A N/A
DeepSpeech 12 3.07 Download

LibriSpeech

Librispeech-test-clean

Network WER CER Link
DeepSpeech-light 15 1.34 Download
DeepSpeech 12 1.55 Download

Librispeech-test-other

Network WER CER Link
DeepSpeech-light 36 3.80 (Download Above)
DeepSpeech 33 3.24 (Download Above)

Once you're set up, you can start training from these nets by using the below parameters (you might need to change the other parameters described in the wiki) after setting the project up:

th Train.lua -loadModel -loadPath /path/to/model.t7

Acknowledgements

Lots of people helped/contributed to this project that deserve recognition: