Naozumi520 / Bert-VITS2-Cantonese-Yue

vits2 backbone with multilingual-bert, modified to support Cantonese
GNU Affero General Public License v3.0
5 stars 1 forks source link

Transcription #4

Closed kexul closed 6 months ago

kexul commented 6 months ago

This is a discussion thread rather than a issue. I put it here since there is no discussion space in this repo. Any help is much appreciated!

  1. I found that the whipser asr is not accurate enough, even for large v3. Is there better model for transcription? I've tried funasr, the result seems to be better, but still need lots of manually fix.
  2. Do we need punctuation in the training .list file?
Naozumi520 commented 6 months ago
  1. iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online in modelscope is the best one I've ever tried. The word may not be accurate but pronunciation is correct.

  2. Yes. Punctuation is very important, it's how the model get the bert feature from.

kexul commented 6 months ago

Thanks for your information! 🤗 I'm using funasr too! As you said, the word is not accurate, but the pronunciation is correct. OK! I may need to manually fix the punctuation, the funasr punctuation is a mess in my end.