NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.09k stars 1.38k forks source link

how to use phonemes to train the model? #442

Open evelynyhc opened 3 years ago

evelynyhc commented 3 years ago

firsr, thank you for your excellent work. And I have a question about how to use phonemes to train models? not in other works, only in this tacotron2.

johnjaniczek commented 3 years ago

Pretty much you need to modify the TextMelLoader object to parse phonemes instead of text. Note that the text is interpreted as a sequence of torch.IntTensor integer numbers. The TextMelLoader object does this conversion expecting textual input:

    def get_text(self, text):
        text_norm = torch.IntTensor(text_to_sequence(text, self.text_cleaners))
        return text_norm

So you'll need to come up with your own system to map the phonemes to integers.

EuphoriaCelestial commented 3 years ago

I am into the phonemes training stuffs too, can you please give some more detail or maybe an example project?

johnjaniczek commented 3 years ago

I don't have an example project I can share unfortunately.

The detail really depends on your implementation of the phonemes. For example if you use the CMU phoneme set you might have a "text input" that looks like: {HH AH L OW} {W ER L D} You would need to map the CMU phonemes (HH, AH, L, etc.) to integers (1, 2, 3) in the data loader. Don't forget about punctuation either. The repo already is doing something like this except they are mapping alphabetic characters (a, b, c) to integers - so that is a good starting point.

sabat84 commented 2 years ago

hello dear @EuphoriaCelestial Could you please tell me how to start phonemes training (Tacotron 2) for different language from english language? for more information the letters of my language are (ئابپتجچحخرڕزژسشعغفڤقكگلڵمنودۆهەوویێصطث). please help

EuphoriaCelestial commented 2 years ago

@sabat84 I am sorry that I can't help, because I don't know how to do that either I have switched to Fastspeech 2, which support phonemes training

Suryansh-Dey commented 10 months ago

@johnjaniczek what is text_to_sequence function? should i define it myself? or there is a definition of it in the repo?