Closed ikhfa closed 6 years ago
If not, it would be nice if we could use some kind of lexicon for syllabification :)
Same problems here, I had changed the audio database to my language along with the transcription (label). The result was error on the alignment. How could I adjust the language model to another language? Or at least the format. Thanks..
In order for syllabification to work you have two options:
e.g.
All data for English GA acent is stored in idlak-data/en/ga
en is the iso standard for English ga is not a standard 2 letter code for an accent.
So for Indonesian it would be:
id/id - assuming just one accent :-)
You can copy all the english data into both and start editing. lexicon-default.xml sylmax-default.xml Are the lexcion and the maximal onset rules (allowing you to specify valid syllable nuclie and onsets).
Sylabification can be over ridden in the lexicon in the same way as using the pron tag if max onset is not good for you:
Looking at it there is a bug in max onset (its not triggering a syllable boundary between repeating nuclei) which I will look at shortly.
v best
Matthew
There was a bug that skipped a second nucleus, this is now fixed.
By editing sylmax-default.xml you can specify which phones are syllabic and list EITHER a set of valid nuclei with or without stress (for languages which have nuclei > 1 phone).
The algorithm finds valid nuclei, then runs back to find a valid onset, then regards whats left as a coda.
Please let me know if https://github.com/bpotard/idlak/commit/a3e85e210cbb38e683118064e838f6b9626e30a2 fixes the issue. Thanks!
This is what I get from the latest commit for cmu_slt_arctic
. But, the synthesize process run without problem.
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_a0086 mismatching number of phones contexts:28 alignment:29
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_a0438 mismatching number of phones contexts:33 alignment:34
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_a0439 mismatching number of phones contexts:31 alignment:32
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_b0244 mismatching number of phones contexts:31 alignment:32
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_b0351 mismatching number of phones contexts:27 alignment:28
WARNING (make-fullctx-ali-dnn:main():make-fullctx-ali-dnn.cc:244) Merge of alignment and contexts failed for key slt_arctic_b0391 mismatching number of phones contexts:34 alignment:35
Hi,
I don't think the alignment merging issues in arctic are due to syllabification problems:
idlak_make_lang.py --mode 1
) that can not deal correctly with repeated phones (this affects every single failing examples above - so that would be a real problem for a language where repeated phones are common).I'll try to make a fix for the idlak_make_lang.py --mode 1
tool to be using the state level alignment - so that we can handle duplicated phones - and hopefully that will remove these warnings!
I modify the lexicon-default.xml
according to my language (Indonesia) and make sure there are not duplicated phone. I convert Indonesian phoneme to English phoneme, so I have all the Indonesian words phonetized in English. But after running the recipe, I get this warning and the process exits immediately.
WARNING (paste-feats:AppendFeats():paste-feats.cc:45) Length mismatch 636 vs. 0 for utt id_mmht_a0002 exceeds tolerance 1
WARNING (paste-feats:AppendFeats():paste-feats.cc:45) Length mismatch 652 vs. 0 for utt id_mmht_a0012 exceeds tolerance 1
WARNING (paste-feats:AppendFeats():paste-feats.cc:45) Length mismatch 632 vs. 0 for utt id_mmht_a0022 exceeds tolerance 1
WARNING (paste-feats:AppendFeats():paste-feats.cc:45) Length mismatch 709 vs. 0 for utt id_mmht_a0032 exceeds tolerance 1
....
Did I miss a step?
Thank you,
I am sorry. It seems the feature have not been created properly. My only problem is the feature extraction. After reading issues 8, the warning disappear.
The issue with idlak_make_lang.py --mode 1
has now been fixed; so I'll close this for now.
I have a little problem with the syllabification algorithm. For a quick example, in my language (indonesian), the word 'maaf' pronounced 'm a0 a0 f' and should be syllabified as 'm+a0 | a0+f'. But using current syllabification module, it'll be syllabified as 'm+a0+a0_f'. How could i adjust this? Because it resulted in 'Merge of alignment and contexts fail' at the alignment process, since the number of phones contexts and alignment mismatched. Thanks.