Open ekolodin opened 2 years ago
Yeah, we follow a generalize format control via the egs/TEMPLATE/svs.sh
where you can find a specified token generation stage. However for now, we do not support G2P conversion yet, so the ideal format I would suggest is to directly use the phonemized text
If you just want to use the existing model for inference, we have the example code in https://colab.research.google.com/github/SJTMusicTeam/svs_demo/blob/master/muskit_svs_realtime.ipynb of how we can use exisitng token list (then the problem would be to convert your current text into the phonemes that the vocabulary has)
Hello, thank you for amazing work! Couldn't understand how to translate English text (on which I want to inference your model) to torch tensor of tokens IDs. As far as I understand you firstly convert string to sequence of phonemas and then to their's indexes, am I right? Could you help me please how to do it?