dspavankumar / keras-kaldi

Keras Interface for Kaldi ASR
GNU General Public License v3.0
121 stars 42 forks source link

question final.mdl #1

Closed vince62s closed 7 years ago

vince62s commented 7 years ago

Hi Pavan, Your work looks great, I am quite interested in trying it. I am not familiar with the nnet1 framework, more with the nnet3. I don't exactly understand the full pipeline. I undertstand the final.mdl just comes from the gmm training as is and is not changed. This means that the .h5 nn model is required at decoding time.

Am I correct ? would there be a way to recompute a final.mdl that could be nnet3 compatible?

also did you notice slow training time or similar ?

and at decoding time, big difference with GPU vs CPU ?

thanks, and congrats again.

dspavankumar commented 7 years ago

Hello,

Thank you. As you mentioned, the final.mdl is replicated from the GMM directory. The neural network dnn.nnet.h5 has no transition model, and so we need the final.mdl during decoding (transitions are not trained in this setup).

I'm haven't used nnet3, so I can't compare them. However, I guess nnet3 uses C++, so it should be faster than Python.

Regarding decoding on CPU vs GPU, I used small (3-hidden layer) models to test short utterances (less than five seconds). GPU actually took a little longer, perhaps for moving data in and out of its memory. If we tested larger models and longer utterances, the forward pass would be considerably faster on GPU. Training on GPU was surely several folds faster than on CPU.

To convert the models to nnet3, the DNN first needs to be converted into nnet3's raw format. Weights and biases of each DNN layer need to be printed from Python. The model can be loaded using load_model() method of keras.models. Each layer from the list layers of the model has a _getweights() method that retrieves the bias vector and the weight matrix as another list. After we get the DNN in raw format, we could feed nnet3-am-init with the available final.mdl and initialise with the trained weights and GMM's transition model.

Thanks, Pavan.

vince62s commented 7 years ago

thanks. Never tempted to do a full DNN training from the filter banks features directly ?

dspavankumar commented 7 years ago

I believe filterbanks perform similarly to MFCCs; they are just a linear transformation of untruncated MFCCs. I did test them sometime ago; I have no results currently.

vince62s commented 7 years ago

oh yes I know, I meant without the GMM/HMM transition model. features directly feeding the network.

dspavankumar commented 7 years ago

Like a sequence-to-sequence RNN? No I didn't, but it's definitely interesting to look at.

vince62s commented 7 years ago

Yes that's what I have in mind but what I don't get is that most implementation for this are with a CTC loss function stuff. purely seq 2 seq does not seem to be easily doable.