CSTR-Edinburgh / merlin

This is now the official location of the Merlin project.
http://www.cstr.ed.ac.uk/projects/merlin/
Apache License 2.0
1.31k stars 441 forks source link

How to change the text to hts_lab #211

Closed Jiff-Zhang closed 6 years ago

Jiff-Zhang commented 7 years ago

Hi, guys. I am now using merlin for Chinese training. I have prepared the prosody_label as below: # 0.282300 125 sil 0.134400 125 x 0.288550 125 iang 0.110800 125 x 0.204150 125 iang 0.034000 125 g 0.199600 125 ang

but I am confused about how to change this to hts_label, I have read the file of merlin/misc/script/frontend/festival_utt_to_lab, but still with trouble.

How can I emerge the hts_lab? What should I prepare for emerging hts_lab? It seem that I need to emerge festival_utt first at first. Once I have the festival_utt, how to use the merlin/misc/script/frontend/festival_utt_to_lab? Could you please give me some advice? Honestly appreciate.

candlewill commented 7 years ago

I am also working on Chinese TTS using Merlin. If there is a tutorial for Chinese, that would be very helpful.

chazo1994 commented 7 years ago

I don't understand why you have to change text to hts_lab via merlin. You can use festival to create labs for hts, but i don't know festival support Chinese or not (for Vietnamese, it's not). in other way, you can create hts lab follow this structure: http://www.cs.columbia.edu/~ecooper/tts/lab_format.pdf this structure for english, for Chinese you should figure out by yourself. I usually convert lab from hts to merlin's structure to train with merlin.

Jiff-Zhang commented 7 years ago

@chazo1994 Thanks for your reply, but for Chinese label, I am confused about the format, the way to generate the parameter, and how to format the question file. If there contain some files for how festival create labs for English, and the regulation to format question file, this would be helpful.

chazo1994 commented 7 years ago

@willian56 Question file use regex expression to determine phoneme (to cluster states by decission tree in hts) follow the stuture of labs file. if you want to create question file, the first you have to understand linguishtic feature of your language like: (1) Sub-syllable: (current sub-syllable, preceding one and two sub-syllables, and succeeding
one and two sub-syllables) Initial/final, final with medial, long model, articulation
category of the initial, and pronunciation category of the final (2) Syllable: The number of sub-syllables in a syllable and the position of the syllable in the note (3) Phrase: The number of sub-syllables/syllables in a phrase (4) Song: Average number of sub-syllables/syllables in each measure of the song and the
number of phrases in this song follow this paper: HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets

the second: to create question file yourself, you can use a script in hts ( download hts demo): makequestion.pl in data/scripts. And you have to modify en_US.talk.conf in data/configs same as your language's question.

To generate labs, you can use fesival or not. I do not use festival for my language. I make labs follow structure of lab that i give you above (for my language i use stucture is modified). To do that you can use part of speech tagger (POS), word segmentation, word chunking, tobi to label the tone.

this slide may be help you for Chinese: mandarin tts using hts toolkit Labs and question in hts same as merlin (after alignment)

Jiff-Zhang commented 7 years ago

@chazo1994 Thanks for your advice, it really helps.