FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
https://funaudiollm.github.io/
Apache License 2.0
4.56k stars 461 forks source link

合成音檔問題 #178

Open yiwei0730 opened 1 month ago

yiwei0730 commented 1 month ago

非常感謝開發組! 我自己測試下來發現幾個問題,當我用自己的ref音檔去做合成測試的時候,發現 (1) 發音不一定標準(有錯字) (2) 會自動加字(這部份似乎會有無限循環的時候,有時候只會增加幾個已經合成過的字) (3) 韻律有時候會忽快忽慢,有點太過自由奔放 (4) 多語合成的時候好像會有問題,不太確定是不是本來就不支援過多語言組合。 不知道以上的問題能不能獲得回應,謝謝。

整體來說音質不錯,聲音相似度也很高,流暢度也很自然。

aluminumbox commented 1 month ago
  1. try use some other words instead
  2. reduce sentence length, llm performance will deteriorate when sentence gets longer
  3. well this is the drawback of llm, we cannot control it, try other seed
  4. cross lingual is not stable because we haven't constructed enough cross lingual data. try add lingual label at sentence start, like <|zh|> or <|en|>, it may fix some case
shirubei commented 1 month ago

关于多语言混合的case,试验了以下,确实可以。

<|zh|>那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐。<|zh|><|en|>Windows<|en|><|zh|>下可用哦。<|zh|>'