Open yanyanxixi opened 6 years ago
Hi, the Montreal-Forced-Aligner only gives the phone and word alignments, which makes sense since its main purpose is prosody research and not TTS. However, you can retrieve the state alignment from the files generated by Kaldi. Assuming you have a single speaker "speaker_A" and that the intermediate files are written in ~/Documents/MFA, you can use the Kaldi functions ali-to-pdf
and ali-to-post
:
ali-to-pdf ~/Documents/MFA/speaker_A/tri/final.mdl ark:tri/ali.0 ark:- | ali-to-post ark:- ark,t:-
This command will output the pdf ids associated with the best lattice path for every time frame (and each aligned files):
file_id [ pdf_id 1 ] [ pdf_id 1 ] ...
You can relate a pdf id to a given state by using the output of the function show-transitions
:
show-transitions ~/Documents/MFA/speaker_A/dictionary/phones.txt tri/final.mdl
which displays the structure of the finite state transducer used for the alignment:
Transition-state 1: phone = sil hmm-state = 0 pdf = 2
Transition-id = 1 p = 0.918591 [self-loop]
Transition-id = 2 p = 0.0259848 [0 -> 1]
...
Thanks for this! I'd looked into it a little while back as a way of maybe outputting allophonic transcriptions rather than phonemic ones, but haven't had time to devote to figuring it out.
Glad if it helps! The MFA is a great tool that has really helped me. It simplifies a lot the use of Kaldi especially the preparation of all the files in the lang directory and the language model. Thanks for this project.
Hi, just going back to this as I realised that I gave here some wrong information.
In fact there is not a unique mapping between pdfs and states if the option SHARED is set to True (since some states will share the same pdfs). But there is a unique mapping between the transition-ids of the models and their states. This mapping can also be read from the output of show-transitions
as in the command given above. The sequence of transition-ids associated with the alignment can be read from the final alignment file (in this case I stopped after trigram alignment):
copy-int-vector ark:tri/ali.0 ark,t:-
Right, that makes sense, I remember playing around with show-transitions
at one point. Great, thanks for the update!
hi can this job do the state_alignment to split each phone ?
thanks