How to optimize the mispronunciation of Chinese polyphonic characters

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

https://arxiv.org/abs/2410.06885

MIT License

7.5k stars 933 forks source link

How to optimize the mispronunciation of Chinese polyphonic characters #259

Closed LSliu666 closed 1 month ago

LSliu666 commented 1 month ago

How to optimize the mispronunciation of Chinese polyphonic characters

SWivid commented 1 month ago

@LSliu666 Hi, see #193

LSliu666 commented 1 month ago

Okay, thank you。Is there any optimization in determining the pronunciation of polyphonic characters based on specific phrases @SWivid

SWivid commented 1 month ago

@LSliu666 Mainly for dataset. A good dataset covering most cases and transcribed just correctly will let model do better on this.

LSliu666 commented 1 month ago

I am using Gradio. How to indicate the pinyin of the text so that I won't read it incorrectly

LSliu666 commented 1 month ago

@SWivid

SWivid commented 1 month ago

Currently no such feature in Gradio yet. Need to see through the code, part as in #193. Welcome PR~

LSliu666 commented 1 month ago

Is code inference possible？

SWivid commented 1 month ago

What do you mean with code inference, the infer-cli?

LSliu666 commented 1 month ago

yes

SWivid commented 1 month ago

Yes, it is just as we suggest above

Need to see through the code, part as in https://github.com/SWivid/F5-TTS/issues/193.

Or simply you could do like: 好奇 -> 浩奇

LSliu666 commented 1 month ago

f5-tts_infer-cli \ --model "F5-TTS" \ --ref_audio "ref_audio.wav" \ --ref_text "The content, subtitle or transcription of reference audio." \ --gen_text "Some text you want TTS model generate for you." Fill in Pinyin in gen_text

LSliu666 commented 1 month ago

Yeah, I can change the homophone to his homophone.thanks

LSliu666 commented 1 month ago

将多音字改成它的同音字

LSliu666 commented 1 month ago

Pursuing accurate pronunciation rather than word accuracy, haha, interesting

SWivid commented 1 month ago

@LSliu666 Mainly for dataset. A good dataset covering most cases and transcribed just correctly will let model do better on this.

@LSliu666 Or you could just provide us with some data, we will take that to training.

Pursuing accurate pronunciation rather than word accuracy, haha, interesting

It's easy to be an armchair critic.

LSliu666 commented 1 month ago

Sorry, I don't have any relevant data

LSliu666 commented 1 month ago

追求发音准确而不是单词准确，哈哈，音符

The meaning of this is that I have found an alternative method for mispronouncing polyphonic characters, which is to replace the Chinese character of a polyphonic character with its homophone, so that its pronunciation is correct, pursuing consistency in pronunciation without worrying about whether the character is correct. Just like 穿着 can read as chuanzhe，If I write 穿卓 can read as chuanzhuo this is the right place to study

SWivid commented 1 month ago

@LSliu666 Sure, hope this temporary strategy would help. We don't mean to limit model's ability on that, rather the model is limit with dataset (if you could just provide a bunch of perfect data, we could just train with unicodes and let model learn semantic things itself)

Feel free to open issue if further questions.

LSliu666 commented 1 month ago

Okay, thank you for your answer