festvox / festival

Festival Speech Synthesis System
Other
371 stars 58 forks source link

build_prompts_waves not finding utf-8 word in lexicon #26

Open roedoejet opened 4 years ago

roedoejet commented 4 years ago

I am building a clunits synthesizer and am getting the following error when I run ./bin/do_build build_prompts_waves:

Unknown word wa'òn:swahte
./bin/do_build: line 157:  1415 Segmentation fault: 11  $ESTDIR/../festival/bin/festival --heap $HEAPSIZE -b festvox/build_clunits.scm '(build_prompts_waves "'$PROMPTFILE'")'

The word is defined in my etc/txt.done.data file as:

( kawe0002 "wa'òn:swahte'" )

and in festvox/nrc_moh_am_lexicon.scm as:

(lex.add.entry '( "wa'òn:swahte'" nil (((W A X OONL) 0) ((S W A H) 0) ((T E X) 0))))

The problem goes away if I take out all of the non-ASCII characters, but if I would prefer to keep them in, is there some way to configure Festival to support the character set of my language?

Thank you.

ddavout commented 4 years ago

Is it an Indic voice ? (if yes look at https://github.com/festvox/festival/issues/13#issuecomment-414084297) Do you use a Grapheme based voice build ? otherwise what have you done to make it work ? do you use somewhere utf8explode ?

roedoejet commented 4 years ago

No, it is an Canadian Indigenous language, Kanyen'kéha. I have not used the Grapheme based voice build.

I'm using the phonetic alignment trick here: http://tts.speech.cs.cmu.edu/11-823/hints/clock.html and mapping it to English.

Is there documentation or a tutorial on the Grapheme based voice build? Will it perform better than just mapping to English?

ddavout commented 4 years ago

I used once a Grapheme based voice in utf8 for French: just experimentation with few prompts and without lexicon I'd just have a tiny pb ( I can't remember which) and I found the result quite astonishing I read https://github.com/festvox/festvox/blob/master/docbook/grapheme.sgml and follow festvox/src/grapheme/build_cg_grapheme_voice

grep into your files for symbolexplode, and use utf8explode instead

what is your default voice ? I found out that Segmentation fault can come from the poslex use, I had to remove it temporarily to see what really the problem was.

roedoejet commented 4 years ago

I believe my default voice is kal_diphone. I tried to follow the directions, but on the label step ./bin/do_build label etc/txt.done.data it fails and looks like it didn't actually create the label files. Here's the error from my console (it produces this same error for all 852 files):

Accepted sentences are: 852 / 852
GENERATING SHORT PAUSE: AT 12 iteration
I am trying to middle state of pause model short silence model...
I assumed short pause to be of 3 states rather it is 5
Aborting...
EHMM align
NO of words: do not match: TW: 32731 13
EHMM standardize_statenames
mv: rename lab/kawe0001.sl to lab/kawe0001.slehmm: No such file or directory
awk: can't open file lab/kawe0001.slehmm
roedoejet commented 4 years ago

This error is on Mac OS Mojave. I ran it in a Docker container running Ubuntu and it produced a similar error:

mv: cannot stat 'lab/kawe0845.sl': No such file or directory

ddavout commented 4 years ago

I had these one before .. I can't remember what the problem was but what I remember is that some EHMM files are not renewed ehmm/etc/ph_list for example they are kept as useful for optimization (later in cg_test) but when they are wrong they stay wrong

ddavout commented 4 years ago

Is your ehmm/model101.txt not even created ? Are there lines of nan nan nan ... nan in your mvar.txt at the root of your voice dir ? ehmm just running hopelessly .. don't ask why there is a comment preventing a new ehmm setup when you are using the same build with a new prompt-file https://github.com/festvox/festvox/blob/ec3e8b4ee739447b7c2a062506759049d4ab1b5a/src/general/do_build#L71

to try with a new setup decrease {no_max_iterations} from 30, to say 10. It worked for me

roedoejet commented 4 years ago

Yes, this worked, thank you! I changed line 205 in $FESTVOXDIR/src/ehmm/bin/do_ehmm to $EHMMDIR/bin/ehmm ehmm/etc/ph_list.int ehmm/etc/txt.phseq.data.int 1 0 ehmm/binfeat scaledft ehmm/mod 0 0 0 10 $num_cpus and it worked. I think the naturalness is not of the quality I would like for the specific application I'm working on. But it's cool nonetheless.

As for natively supporting utf-8 characters in building a clunits voice - can I assume this is not on the plan for future development?