festvox / festival

Festival Speech Synthesis System
Other
376 stars 58 forks source link

Flite file or the voice dump generated from cg festvox voice does not work #60

Open plehal opened 3 years ago

plehal commented 3 years ago

Hello Sai, I was able to successfully create a cg voice with your script. The voice does work in festival. However, now I may have more questions than you have already answered. The flite build process was completed (with some glitches). xxxxxphonestate.c file had multiple definitions (7)of one phone 'pa'. Compile was successful after commenting 6 out of 7 duplicate definitions. But the resulting flite executable does not generate any speech. It gives the error "Error mlgparaChol: Different dimension". What does this mean? and How can I resolve this issue?

saikrishnarallabandi commented 3 years ago

Interesting error. I think this is because flite thinks 'str' feature was used to build the voice, but in the script I gave you 'str'(strength of excitation) is not used. Let me get back.

tagging @Alan Black @.***> Link to the previous discussion: https://github.com/festvox/festival/issues/13#issuecomment-844072692

On Wed, May 19, 2021 at 11:55 AM plehal @.***> wrote:

Hello Sai, I was able to successfully create a cg voice with your script. The voice does work in festival. However, now I may have more questions than you have already answered. The flite build process was completed (with some glitches). xxxxxphonestate.c file had multiple definitions (7)of one phone 'pa'. Compile was successful after commenting 6 out of 7 duplicate definitions. But the resulting flite executable does not generate any speech. It gives the error "Error mlgparaChol: Different dimension". What does this mean? and How can I resolve this issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/festvox/festival/issues/60, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO4QMLT2ACTOIMR4M6ZE5DTOPNO7ANCNFSM45E7G22A .

plehal commented 3 years ago

OK....I thought, while you are looking into this issue, I create some more.... I took same input wav files and ran them thru highpass/lowpass filters(300,1000). The voice was created along with the corresponding flite voice. This time it found 113 channels compared to previous 103....whatever it means. However, this time, flite did not give the same error, instead it complained about "oss_audio: failed to open audio device /dev/dsp". Please, keep in mind that nothing has changed in the environment (system uses pulseaudio and other voices work just fine with it). The machine has not even been rebooted. Positive thing is the flite did create a wave file with -o option...So, I have questions/concerns about robustness of the process. How can we speedup rf build process? The cg voice built using the script takes longer to process text in festival. Any specific reasons for that? Also, there is a little quality drop from festival to flite voice output, is there any way to improve that? Thanks for all the help. Now, that the process has created one working voice, I can record better quality prompts and create better voice.

plehal commented 3 years ago

Is there any book or documentaion detailing inner working of flite tts/data model. I want to finetune certain phone durations for a few words. Please, advise.

saikrishnarallabandi commented 3 years ago

does this help

http://www.festvox.org/flite/doc/flite.pdf

On Fri, May 28, 2021 at 11:10 AM plehal @.***> wrote:

Is there any book or documentaion detailing inner working of flite tts/data model. I want to finetune certain phone durations for a few words. Please, advise.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/festvox/festival/issues/60#issuecomment-850486978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFO4QMMZDS7ES2S3WMR36VTTP6W7LANCNFSM45E7G22A .

plehal commented 3 years ago

No, this document does not provide much insight into inner workings of flite data structure or what can or should be tweaked (if any). For example, the voice I generated ot am trying to tweak produces a funny sound for word ( m aInas), how ever (m aI) or m with any other phones like A, i, i:, aU etc works just fine. Similrly, aInas works fine with other letters. I tried tweaking durmodel.c file but in vain. That was the reason why I wanted to understand the relationship between various files/data structures.