WelkinYang / Learn2Sing2.0

Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
https://welkinyang.github.io/Learn2Sing2.0/
176 stars 26 forks source link

regarding english dateset #4

Closed dutchsing009 closed 2 years ago

dutchsing009 commented 2 years ago

hi great work done here !!! I wanted to know if this repo is going to work on English speaking dataset ?? and whether Are there English examples for reference to know the quality ?? and if yes it is going to work on English dataset what exactly should i do ,? like in the "Replace the phoneset and pitchset in text/symbols.py with your own set" what would be the case here if using English ? Also "Provide the path to the data in config.json" is clear but what what would be the format ??

Thanks in advance!

WelkinYang commented 2 years ago

Thank you for your interest in our work.

First of all, our method is not limited to particular languages, especially I think in the task of speech synthesis and singing voice synthesis, synthesizing Chinese is harder than synthesizing English (Chinses is a tonal language), so if it works well on the Chinese dataset, then it will definitely work normally on English dataset.

Second, we did not experiment on English data because we did not find a suitable English singing dataset as a singing teacher And although we used Chinese internal datasets as singing teachers, there are open source Chinese singing datasets of the same size that can be used to reproduce our work, such as opencpop (https://arxiv.org/abs/2201.07429).

Finally, experimenting with English dataset on this repo is simple, just prepare the phoneme set of English data to place into text/symbols.py, English phoneme set can be easily obtained from the set provided by cmudict (https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/text/cmudict.py) or use just use characters,(https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/text/symbols.py) and the data format is no different from the format of labels in testdata (only the phoneme set is different).