Open Cardroid opened 2 years ago
Hi, Many thanks for your interest in our projects!
We currently did not intensively test the text modules. Also, given limited data concerning the SVS, we haven't found a working solution for directly removing the dependency of phoneme information.
One potential hacking method would be equally distributing the phoneme duration in the duration of your word and letting the seq2seq model decide its duration during training. But it still requires some alignment over the word level. We have tried that for CSD corpus and it goes well with Korean syllables. I'm preparing the PR for now and will update it here shortly
Thank you👍 I'm looking forward to it!
First of all, I would like to express my gratitude for creating a wonderful project.👍
I saw that there are various tokenizer implementations under the text folder. However, I couldn't find a recipe using these options.
I don't have a phoneme label in my own dataset. You can make it, but it would be nice if you could use it without making it.
If possible, could you tell me how to train and inference models without a phonemic label?