SJTMusicTeam / Muskits

An opensource music processing toolkit
Apache License 2.0
311 stars 44 forks source link

Translate English text to model input #135

Open ekolodin opened 2 years ago

ekolodin commented 2 years ago

Hello, thank you for amazing work! Couldn't understand how to translate English text (on which I want to inference your model) to torch tensor of tokens IDs. As far as I understand you firstly convert string to sequence of phonemas and then to their's indexes, am I right? Could you help me please how to do it?

ftshijt commented 2 years ago

Yeah, we follow a generalize format control via the egs/TEMPLATE/svs.sh where you can find a specified token generation stage. However for now, we do not support G2P conversion yet, so the ideal format I would suggest is to directly use the phonemized text

ftshijt commented 2 years ago

If you just want to use the existing model for inference, we have the example code in https://colab.research.google.com/github/SJTMusicTeam/svs_demo/blob/master/muskit_svs_realtime.ipynb of how we can use exisitng token list (then the problem would be to convert your current text into the phonemes that the vocabulary has)