anhnh2002 / XTTSv2-Finetuning-for-New-Languages

60 stars 17 forks source link

Text preprocessing #3

Closed ScottishFold007 closed 1 month ago

ScottishFold007 commented 1 month ago

Text preprocessing. If the data is not in English, does the text need some processing? For example, should Chinese text be converted to Pinyin?

anhnh2002 commented 1 month ago

Text preprocessing. If the data is not in English, does the text need some processing? For example, should Chinese text be converted to Pinyin?

You don't need to convert Chinese text to Pinyin. However, it is highly recommended to normalize text, such as numbers and special characters, into words. Fortunately, Coqui supports this for Chinese https://github.com/nguyenhoanganh2002/XTTSv2-Finetuning-for-New-Languages/blob/95866555d61dba2ddb014af4af51e58ba6aade78/TTS/tts/layers/xtts/tokenizer.py#L524