哈哈哈的声音合成不出来

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

2.07k stars 185 forks source link

哈哈哈的声音合成不出来 #4

Closed MonolithFoundation closed 6 days ago

MonolithFoundation commented 1 week ago

略显生硬

ZhihaoDU commented 1 week ago

试试”哈哈“两个字，以及多个不同的seed看看。另外笑声等富语言推荐使用instruct模型

MonolithFoundation commented 1 week ago

都不行效果很差

---- 回复的原邮件 ---- | 发件人 | Zhihao @.> | | 日期 | 2024年07月05日 15:12 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [FunAudioLLM/CosyVoice] 哈哈哈的声音合成不出来 (Issue #4) |

试试”哈哈“两个字，以及多个不同的seed看看。另外笑声等富语言推荐使用instruct模型

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ZhihaoDU commented 1 week ago

我简单试了一下，选预置音色生成、中文女，这句话效果还是可以呀

哈哈，那真是太好了，很高兴能够帮助您的孩子学习英语。

你再试试？

MonolithFoundation commented 1 week ago

还是比较生硬。

在语气词这块，相较于ChatTTS还是差蛮多。但是音色更好一点。请问有计划改进一下吗？

这里说的笑指的是笑声，不是念哈哈两个字，这样听起来就非常生硬，不知道CosyVoice是否有考虑加一些类似于笑、嗯、等的一些语气声

如果是Instruct模型的话，是否能提供一个完整的强调模版呢？用instruct 哈哈哈也是念字。

此外，Instruct我设置中文男，出来的事女音

ZhihaoDU commented 1 week ago

使用instruct模型是这样设置的，单个笑声：那位喜剧演员真有才，[laughter]一开口就让全场观众爆笑。笑着说：他搞的一个恶作剧，让大家\<laughter>忍俊不禁\</laughter>。

MonolithFoundation commented 1 week ago

有更全的这种语气配置吗，可以列在readme里面，不然大家都不知道如何用

---- 回复的原邮件 ---- | 发件人 | Zhihao @.> | | 日期 | 2024年07月05日 17:33 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [FunAudioLLM/CosyVoice] 哈哈哈的声音合成不出来 (Issue #4) |

使用instruct模型是这样设置的，单个笑声：那位喜剧演员真有才，[laughter]一开口就让全场观众爆笑。笑着说：他搞的一个恶作剧，让大家忍俊不禁。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

aluminumbox commented 1 week ago

有更全的这种语气配置吗，可以列在readme里面，不然大家都不知道如何用 … ---- 回复的原邮件 ---- | 发件人 | Zhihao @.> | | 日期 | 2024年07月05日 17:33 | | 收件人 | @.> | | 抄送至 | @.>@.> | | 主题 | Re: [FunAudioLLM/CosyVoice] 哈哈哈的声音合成不出来 (Issue #4) | 使用instruct模型是这样设置的，单个笑声：那位喜剧演员真有才，[laughter]一开口就让全场观众爆笑。笑着说：他搞的一个恶作剧，让大家忍俊不禁。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

后续我们再readme里会列出来

lukecq1231 commented 1 week ago

目前支持四种细粒度标签，分别是插入笑声、换气、边笑边说、重读。下面是具体的示例：

Well that’s kind of scary [laughter].
I don’t think I over eat yeah [breath] and um I do exercise regularly.
Well that pretty much covers \the subject\ well thanks for calling me.
The team’s \unity\ and \resilience\ helped them win the championship.

lukecq1231 commented 1 week ago

还是比较生硬。

在语气词这块，相较于ChatTTS还是差蛮多。但是音色更好一点。请问有计划改进一下吗？

这里说的笑指的是笑声，不是念哈哈两个字，这样听起来就非常生硬，不知道CosyVoice是否有考虑加一些类似于笑、嗯、等的一些语气声

如果是Instruct模型的话，是否能提供一个完整的强调模版呢？用instruct 哈哈哈也是念字。

此外，Instruct我设置中文男，出来的事女音

因为Instruct模型的LM模型没有使用Speaker Embedding，所以没法通过只给Speaker Embedding的方式确定特定的音色，从而会导致就算给定的是中文男的Speaker Embedding，也会有一定概率出来女的声音。如果想要Instruct的模型出比较稳定的音色，建议用续写的方式来使用，也就是LM模型给audio prompt和对应的text prompt，同时flow matching模型也给audio prompt。

MonolithFoundation commented 5 days ago

那这个就很蛋疼啊，

首先只有instruct支持这种音调，然后又不能指定音色。

我需要sft那个男音，就那个音色还可以，应该怎么做到