以新数据继续训练llama

fishaudio / fish-speech

Brand new TTS solution

https://speech.fish.audio

Other

6.52k stars 508 forks source link

以新数据继续训练llama #129

Closed Naozumi520 closed 2 months ago

Naozumi520 commented 3 months ago

TL;DR: 我修改了g2p和symbols以支持粤语, 用粤语数据训练后没问题，用特定角色数据精调后不能发音。

首先我用了男声来训练，效果不错。然后我以训练bert-vits2相同的方法，用喜欢的人物数据继续训练模型。我在QQ群问过stardust大佬并得到答覆这样做是没有任何问题的，效果听起来也的确可以。按照同样的逻辑，fish-speech换数据继续训练llama之后不能发声。

这是为什么？能请各位大佬给予思路吗？

leng-yue commented 3 months ago

你用了多少数据?

Naozumi520 commented 3 months ago

粤语10小时。可以说粤语后用30分钟角色数据进行精调

leng-yue commented 3 months ago

对 llm 来说太少了, 这个量模型直接背下来了.

Naozumi520 commented 3 months ago

Lora会有帮助吗？

leng-yue commented 3 months ago

不会特别大, 一般小数据也至少要 100 小时左右

Naozumi520 commented 3 months ago

嗯。。。这是属于灾难性遗忘吗？因为先前精调前的效果都不错。

leng-yue commented 3 months ago

这个码本模型都没见过, 你不能指望它表现很好...

Naozumi520 commented 3 months ago

另外想问一下，可以不分说话人，混合数据训练吗？或者说，不同说话人数据如何用作训练新语言？我一直分不清楚，如果以speaker分开训练，那不是只是分开创建一个角色？

Naozumi520 commented 3 months ago

我用了100小時的數據訓練，效果終於不錯，但是出現了一句句子由不同說話人音色組成的狀況。有什麼其技術手段能消除其說話特徵嗎？

leng-yue commented 3 months ago

可以用 svc 或者带说话人信息 finetune

Naozumi520 commented 3 months ago

So is voice cloning reference will work if I use SVC to convert my data to single character? Cause with mixed dataset voice cloning failed to work even I finetuned with a single character. Or, how do I prevent the model to learn the vocal? To make LLama to be able to speak Cantonese I have to increase the steps as the docs saying, but this also make it learn the vocal.

leng-yue commented 3 months ago

Use SVC to convert your data to single speaker may help.

MokII2 commented 1 month ago

我用了100小時的數據訓練，效果終於不錯，但是出現了一句句子由不同說話人音色組成的狀況。有什麼其技術手段能消除其說話特徵嗎？

成功了么？100小时的粤语数据？

Naozumi520 commented 1 month ago

我用了100小時的數據訓練，效果終於不錯，但是出現了一句句子由不同說話人音色組成的狀況。有什麼其技術手段能消除其說話特徵嗎？

成功了么？100小时的粤语数据？

Not very good, and newer model removed g2p support, the data required is more.