Speech-to-Text-WaveNet : End-to-end sentence level Chinese speech recognition using DeepMind's WaveNet
A tensorflow implementation for Chinese speech recognition based on DeepMind's WaveNet: A Generative Model for Raw Audio. (Hereafter the Paper)
Version
Current Version : 0.0.1
Dependencies
- python == 3.5
- tensorflow == 1.0.0
- librosa == 0.5.0
Dataset
清华30小时中文数据集
Directories
- cache: save data featrue and word dictionary
- data: wav files and related labels
- model: save the models
Network model
- Data random shuffle per epoch
- Xavier initialization
- Adam optimization algorithms
- Batch Normalization
Train the network
python3 train.py
Test the network
python3 test.py
Other resources
- TensorFlow练习15: 中文语音识别
- ibab's WaveNet(speech synthesis) tensorflow implementationt
- buriburisuri's WaveNet(English speech recognition) tensorflow and sugartensor implementationt