Closed vince62s closed 7 years ago
Hello,
Thank you. As you mentioned, the final.mdl is replicated from the GMM directory. The neural network dnn.nnet.h5 has no transition model, and so we need the final.mdl during decoding (transitions are not trained in this setup).
I'm haven't used nnet3, so I can't compare them. However, I guess nnet3 uses C++, so it should be faster than Python.
Regarding decoding on CPU vs GPU, I used small (3-hidden layer) models to test short utterances (less than five seconds). GPU actually took a little longer, perhaps for moving data in and out of its memory. If we tested larger models and longer utterances, the forward pass would be considerably faster on GPU. Training on GPU was surely several folds faster than on CPU.
To convert the models to nnet3, the DNN first needs to be converted into nnet3's raw format. Weights and biases of each DNN layer need to be printed from Python. The model can be loaded using load_model() method of keras.models. Each layer from the list layers of the model has a _getweights() method that retrieves the bias vector and the weight matrix as another list. After we get the DNN in raw format, we could feed nnet3-am-init with the available final.mdl and initialise with the trained weights and GMM's transition model.
Thanks, Pavan.
thanks. Never tempted to do a full DNN training from the filter banks features directly ?
I believe filterbanks perform similarly to MFCCs; they are just a linear transformation of untruncated MFCCs. I did test them sometime ago; I have no results currently.
oh yes I know, I meant without the GMM/HMM transition model. features directly feeding the network.
Like a sequence-to-sequence RNN? No I didn't, but it's definitely interesting to look at.
Yes that's what I have in mind but what I don't get is that most implementation for this are with a CTC loss function stuff. purely seq 2 seq does not seem to be easily doable.
Hi Pavan, Your work looks great, I am quite interested in trying it. I am not familiar with the nnet1 framework, more with the nnet3. I don't exactly understand the full pipeline. I undertstand the final.mdl just comes from the gmm training as is and is not changed. This means that the .h5 nn model is required at decoding time.
Am I correct ? would there be a way to recompute a final.mdl that could be nnet3 compatible?
also did you notice slow training time or similar ?
and at decoding time, big difference with GPU vs CPU ?
thanks, and congrats again.