Open evelynyhc opened 3 years ago
Pretty much you need to modify the TextMelLoader object to parse phonemes instead of text. Note that the text is interpreted as a sequence of torch.IntTensor integer numbers. The TextMelLoader object does this conversion expecting textual input:
def get_text(self, text):
text_norm = torch.IntTensor(text_to_sequence(text, self.text_cleaners))
return text_norm
So you'll need to come up with your own system to map the phonemes to integers.
I am into the phonemes training stuffs too, can you please give some more detail or maybe an example project?
I don't have an example project I can share unfortunately.
The detail really depends on your implementation of the phonemes. For example if you use the CMU phoneme set you might have a "text input" that looks like:
{HH AH L OW} {W ER L D}
You would need to map the CMU phonemes (HH, AH, L, etc.) to integers (1, 2, 3) in the data loader. Don't forget about punctuation either. The repo already is doing something like this except they are mapping alphabetic characters (a, b, c) to integers - so that is a good starting point.
hello dear @EuphoriaCelestial Could you please tell me how to start phonemes training (Tacotron 2) for different language from english language? for more information the letters of my language are (ئابپتجچحخرڕزژسشعغفڤقكگلڵمنودۆهەوویێصطث). please help
@sabat84 I am sorry that I can't help, because I don't know how to do that either I have switched to Fastspeech 2, which support phonemes training
@johnjaniczek what is text_to_sequence function? should i define it myself? or there is a definition of it in the repo?
firsr, thank you for your excellent work. And I have a question about how to use phonemes to train models? not in other works, only in this tacotron2.