MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.26k stars 242 forks source link

about state alignment #62

Open yanyanxixi opened 6 years ago

yanyanxixi commented 6 years ago

hi can this job do the state_alignment to split each phone ?

thanks

cveaux commented 6 years ago

Hi, the Montreal-Forced-Aligner only gives the phone and word alignments, which makes sense since its main purpose is prosody research and not TTS. However, you can retrieve the state alignment from the files generated by Kaldi. Assuming you have a single speaker "speaker_A" and that the intermediate files are written in ~/Documents/MFA, you can use the Kaldi functions ali-to-pdf and ali-to-post:

ali-to-pdf ~/Documents/MFA/speaker_A/tri/final.mdl ark:tri/ali.0 ark:- | ali-to-post ark:- ark,t:-

This command will output the pdf ids associated with the best lattice path for every time frame (and each aligned files):

file_id [ pdf_id 1 ] [ pdf_id 1 ] ...

You can relate a pdf id to a given state by using the output of the function show-transitions:

show-transitions ~/Documents/MFA/speaker_A/dictionary/phones.txt tri/final.mdl

which displays the structure of the finite state transducer used for the alignment:

Transition-state 1: phone = sil hmm-state = 0 pdf = 2
 Transition-id = 1 p = 0.918591 [self-loop]
 Transition-id = 2 p = 0.0259848 [0 -> 1]
...
mmcauliffe commented 6 years ago

Thanks for this! I'd looked into it a little while back as a way of maybe outputting allophonic transcriptions rather than phonemic ones, but haven't had time to devote to figuring it out.

cveaux commented 6 years ago

Glad if it helps! The MFA is a great tool that has really helped me. It simplifies a lot the use of Kaldi especially the preparation of all the files in the lang directory and the language model. Thanks for this project.

cveaux commented 6 years ago

Hi, just going back to this as I realised that I gave here some wrong information. In fact there is not a unique mapping between pdfs and states if the option SHARED is set to True (since some states will share the same pdfs). But there is a unique mapping between the transition-ids of the models and their states. This mapping can also be read from the output of show-transitions as in the command given above. The sequence of transition-ids associated with the alignment can be read from the final alignment file (in this case I stopped after trigram alignment):

copy-int-vector ark:tri/ali.0 ark,t:-

mmcauliffe commented 6 years ago

Right, that makes sense, I remember playing around with show-transitions at one point. Great, thanks for the update!