which am is used in kaldi ?

joan126 commented 3 years ago

Hi, I am using MFA for force alignment between phonenes and audio, I want to know nnet3 or chain model is used to train MFA from scarch? As I know than tdnn in nnet3 is better for alignment.

mmcauliffe commented 3 years ago

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

joan126 commented 3 years ago

Neither, it just uses the GMM-HMM pipeline for training through LDA+SAT. The experiments that we did with nnet2 several years had very slight benefits if at all over the GMM acoustic models that didn't justify the increase in complexity and train time in my mind. If you've found better performance for alignment with nnet3, I'd be curious in seeing that, but from what I understand the nnet training in Kaldi generally takes the alignments from the GMM model and doesn't optimize any further on alignment.

thanks for reply, I get it! I used 15 hours high quality tts dataset to train from scrach. however, alignment results is not accurate. I am wondering whether if it's dataset size is too small ? can your give some suggestion to improve alignment accuracy?

MontrealCorpusTools / Montreal-Forced-Aligner

which am is used in kaldi ? #274