Open MaheshVelankar opened 7 years ago
In order to use mimic for a new language you will need a language model (able to do phonetic transcriptions among other things) and a speech model.
The mimic-spanish module uses an external library named Saga for the phonetic transcriptions and syllabification and to do that it uses a custom lexical insertion function.
The speech synthesis in mimic is done through a series of modules (defined at src/synth/cst_synth.c
) that take care of:
Depending on the speech model you have or you want to build (a clustergen voice [cmu_us_slt] , an HTS voice [cmu_us_slt_hts] a diphone voice [kal16]...) you will require different features from the language analysis part. For instance, in diphone voices you get a model for each possible pair of phones, and you need to provide a model for the duration of these diphones. In clustergen and HTS voices you extract more contextual features from the language model (the current phoneme to synthesize, the previous and next phoneme, the total number of syllables of the current word, the number of syllables in this word before the current phoneme, the number of words in the phrase...) These features are needed to improve the speech quality, as they are used to build a decision tree that chooses the most similar speech sample in the database (clustergen) or the most similar trained speech model (HTS).
Mimic (fork of Flite, in turn derived from Festival) uses an Heterogeneous Relation Graph (HRG) formalism to define and extract those linguistic features sharing as much as possible between languages. The HRG is defined in src/hrg
and states that a text is divided into Utterances ("paragraphs") were each utterance can be synthesized independently. Each utterance has a set of Relations, that contain Items. For example, the utterance would have:
For Spanish support, I tried to keep Spanish close to the HRG structure in mimic so I could reuse all the language independent functions that mimic offers to extract the HTS and clustergen features, so I inserted into the SylStructure and Segment relations the syllabification and phonetic transcription information from the Saga transcription tool. Feel free to use it as an example.
Do you have a speech model already or some recordings? I know there is some indic support in mimic (inherited from Flite) but I don't know if the speech models (voices) work well or not.
Thanks for a very prompt response and an elaborate description (and that too very easy to understand)
I am able to build and use htsvoice in a tts system, that is using festival. I already have the parser that is used for training and synthesizing ........that converts text into syllables and phonemes............I will study how the Spanish is using their external parser and try to work on the similar lines.
I do not work on this continuously.so please excuse me if my further questions are delayed in time.
Thanks again
Just a couple more things:
I also work on this whenever I can, so we will ask/answer questions to each other whenever we can. No pressure 👍
Best
I don't know helpful this would be, but if you're looking for something that could handle tokenisation, Apertium has a FOSS morphological analyser for Marathi - https://sourceforge.net/p/apertium/svn/HEAD/tree/languages/apertium-mar/
This is a question rather than an issue.
I wish to use this utility for Marathi language, for which I have my on parser to convert Unicode (utf8) Devanagari text into phonemes.
I want to know how can use it? Which part of the code should I look into in order to modify it to use my parser instead of whatever parser this utility is using.
I have just started looking into this utility. So please excuse me if this is too trivial etc.
Thanks in advance