ga642381 / FastSpeech2

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech :fist:
92 stars 16 forks source link

Does it able to learn certain voice style? #5

Open lucasjinreal opened 2 years ago

lucasjinreal commented 2 years ago

Does it able to learn certain voice style?

ga642381 commented 2 years ago

Hi, thanks for your question. This repo doesn't support learning voice style for now. We might need a style encoder if we want to learn the voice style. Recently, instead, we have been focusing on multilingual TTS. such as supporting Chinese, Taiwanese, and so on.

lucasjinreal commented 2 years ago

@ga642381 hi, does multilane tts performant can compatible with single lan? isn't the phone space would be very large?

ga642381 commented 2 years ago

I agree with you. So the collaborator of this repo, Wei-Ping Huang, does have some research on how to use self-supervised features to learn shared phonetic information across different languages. (ref: Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding https://arxiv.org/abs/2206.15427)

As for this repo, I think at least we can support different datasets for various languages to make it more friendly for the community to do multispeaker, multilingual TTS research.