anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
https://anonymous-pits.github.io/pits/
MIT License
274 stars 34 forks source link

Training with my own Korean dataset #21

Open myyhlee opened 1 year ago

myyhlee commented 1 year ago

I am trying to train using my own Korean single speaker dataset. Could you give me some instruction on how to structure .txt files in filelist?

anonymous-pits commented 1 year ago

Hi! Our implementation contains basic Korean setup so you only need to do something.

First, you should convert KSS text to metadata with similar format to ours. In this part, you should apply external g2p (p should be jamo too, not ipa or other phoneme sets).

Second, change config files' data section, including metadata path, speakers (list with single entity), and languages "ko_KR".

And run! Please add additional comment if you need.

p0p4k commented 1 year ago

hello, why not IPA? I tried jamo and it works good, but shouldnt IPA work as well? Or in your experience, IPA for korean language is worse than jamo? Thanks.

anonymous-pits commented 12 months ago

Its because of our internal g2p systems are not ipa-based. IPA is good and universal representation, so if you have a good g2p model for ipa, you can use it. I do not think that phoneme systems affect very much for single language system (it could for multilingual TTS), while it contains enough information to represent that language in phoneme.