MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
136
stars
31
forks
source link
End-to-End Speech Recognition using RNN-Transducer
File description
eval.py: rnnt joint model decode
model.py: rnnt model, which contains acoustic / phoneme model
model2012.py: rnnt model refer to Graves2012
seq2seq/*: seq2seq with attention
rnnt_np.py: rnnt loss function implementation on mxnet, support for both symbol and gluon refer to PyTorch implementation
DataLoader.py: data process
train.py: rnnt training script, can be initialized from CTC and PM model
train_ctc.py: ctc training script
train_att.py: attention training script
Directory description
conf: kaldi feature extraction config
Reference Paper
Run
Compile RNNT Loss
Follow the instructions in here to compile MXNET with RNNT loss.
Extract feature
link kaldi timit example dirs (local
steps
utils
)
excute run.sh
to extract 40 dim fbank feature
run feature_transform.sh
to get 123 dim feature as described in Graves2013
Train RNNT model:
python train.py --lr 1e-3 --bi --dropout .5 --out exp/rnnt_bi_lr1e-3 --schedule
Evaluation
Default only for RNNT
Results
CTC
Decode
PER
greedy
20.36
beam 100
20.03
Transducer
Decode
PER
greedy
20.74
beam 40
19.84
Requirements
Python 3.6
MxNet 1.1.0
numpy 1.14
TODO
beam serach accelaration
Seq2Seq with attention