regarding english dateset

Thank you for your interest in our work.

First of all, our method is not limited to particular languages, especially I think in the task of speech synthesis and singing voice synthesis, synthesizing Chinese is harder than synthesizing English (Chinses is a tonal language), so if it works well on the Chinese dataset, then it will definitely work normally on English dataset.

Second, we did not experiment on English data because we did not find a suitable English singing dataset as a singing teacher And although we used Chinese internal datasets as singing teachers, there are open source Chinese singing datasets of the same size that can be used to reproduce our work, such as opencpop (https://arxiv.org/abs/2201.07429).

Finally, experimenting with English dataset on this repo is simple, just prepare the phoneme set of English data to place into text/symbols.py, English phoneme set can be easily obtained from the set provided by cmudict (https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/text/cmudict.py) or use just use characters,(https://github.com/mozilla/TTS/blob/e9e07844b77a43fb0864354791fb4cf72ffded11/TTS/tts/utils/text/symbols.py) and the data format is no different from the format of labels in testdata (only the phoneme set is different).

WelkinYang / Learn2Sing2.0

regarding english dateset #4