RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MIT License
29.83k stars 3.44k forks source link

感谢各位大佬的杰出工作。关于inference_webui.py的问题 #837

Open ThornbirdZhang opened 4 months ago

ThornbirdZhang commented 4 months ago

非常感谢有这样的杰出opensource,极大的降低了个性化tts的难度 关于inference_webui.py的问题,当我用它来zero shot,也就是在共有模型基础上,只提供8秒的参考音频进行tts。

  1. get_tts_wav中对于待推理的文版进行了处理,phones2,bert2,norm_text2=get_phones_and_bert(text, text_language),根据后面的log,print(i18n("前端处理后的文本(每句):"), norm_text2),这里仿佛是按照“前端”的规则处理了文本。而这种处理对于输入of a sweet savour onto the LORD,会输出of a sweet savour unto the L O R D,结果音频输出中会变成单个字母的发音。这个“前端处理”的作用?如何避免L O R D这样的问题出现?
  2. 偶尔会出现全部的句子成为了静音,也就是{句子A}{句子B}{句子C},句子B的位置是静音。这种错误是为什么?

任何帮助都非常感谢!

RVC-Boss commented 4 months ago

https://github.com/RVC-Boss/GPT-SoVITS/issues/586 这个可能需要你改成小写,因为全大写的规则专门给到了例如USB这样要分别念字母的情况

ThornbirdZhang commented 4 months ago

对于1,我在处理时已经参考了相同的做法,把发现的指定大写组合改为小写。 对于2,这个问题很困扰啊,需要提供当时的文本吗?我今天又发现了一例

ThornbirdZhang commented 4 months ago

content.zip content.srt是srt,firstParagraph.mp3 是结果文件,开始的第一句中对于and for his son,后的and for his daughter是静音的,然后会有两遍的and for his brother。文件后面也有相同的问题。还请大佬帮助

KamioRinn commented 4 months ago

预测推理的时候,同句话里面有相似度极高的句子容易引发复读,建议将句子按标点切分。

https://github.com/RVC-Boss/GPT-SoVITS/assets/63162909/0183c579-96cd-4d6b-a659-72ce392e712c

ThornbirdZhang commented 4 months ago

按照标点切分后,在使用auto作为text_language参数传入,还是会有出现静音的情况。 手动将语言设置为en后,没有出现了。附件为同一段文字在auto与en下的结果。 大佬能否处理一下?

image 完整如附件。 contrast.zip

预测推理的时候,同句话里面有相似度极高的句子容易引发复读,建议将句子按标点切分。

content.mp4

KamioRinn commented 4 months ago

按照标点切分后,在使用auto作为text_language参数传入,还是会有出现静音的情况。 手动将语言设置为en后,没有出现了。附件为同一段文字在auto与en下的结果。 大佬能否处理一下?

无法复现,请尝试更新项目代码及依赖

ThornbirdZhang commented 4 months ago

是否和选定的参考音频有关?我采用zero shot,参考音频如下。 20240321_12_09_52_774596_priest_8sec.zip

KamioRinn commented 4 months ago

是否和选定的参考音频有关?我采用zero shot,参考音频如下。 20240321_12_09_52_774596_priest_8sec.zip

如果命令行显示的前端处理后文本一样,那en auto 就没区别,可以考虑提升模型,降低dpo

ThornbirdZhang commented 4 months ago

今天我重试了这个问题,并且添加了日志,现在的情况变得好像每一句话前面都加入了let my guess(不是很确定具体的单词,每次还不一样),result_20240328_14_07_13_326243_18.zip,麻烦大佬指导一下。 这个是具体推理配置: 具体文本: "As a pastor, I would like to share this message of hope and faith with you on this glorious Easter day. On this blessed day of resurrection, let us remember the triumph of light over darkness, love over hatred, and hope over despair. Just as Jesus conquered death, let us embrace the promise of new beginnings and renewed faith in our hearts. May the spirit of Easter fill your life with joy, peace, and endless possibilities. Let us rejoice in the gift of salvation and the eternal love that surrounds us all. Have a blessed and joyous Easter, knowing that He is risen!", 推理语言设置为: "en", 切分方式采用“标点符号”

输出的日志如下: 2024-03-28 14:06:36,894 - INFO - root - vits_api.py - line:368 - before infer, content= voiceId=18 inferText='As a pastor, I would like to share this message of hope and faith with you on this glorious Easter day. On this blessed day of resurrection, let us remember the triumph of light over darkness, love over hatred, and hope over despair. Just as Jesus conquered death, let us embrace the promise of new beginnings and renewed faith in our hearts. May the spirit of Easter fill your life with joy, peace, and endless possibilities. Let us rejoice in the gift of salvation and the eternal love that surrounds us all. Have a blessed and joyous Easter, knowing that He is risen!' inferLang='en' cutMode='auto' T2S Decoding EOS [214 -> 304] added one sub items, start=73640.0, stop=77460.0, text= knowing that He is risen! after inferencing, data length =2478720, time=77460.0 ms 0.329 43.200 1.853 1.089 实际输入的参考文本: and we believe the our purposes for our lives. When we ask questions like, what is the dream for my life? how can God use my gifts? . 实际输入的目标文本: As a pastor, I would like to share this message of hope and faith with you on this glorious Easter day. On this blessed day of resurrection, let us remember the triumph of light over darkness, love over hatred, and hope over despair. Just as Jesus conquered death, let us embrace the promise of new beginnings and renewed faith in our hearts. May the spirit of Easter fill your life with joy, peace, and endless possibilities. Let us rejoice in the gift of salvation and the eternal love that surrounds us all. Have a blessed and joyous Easter, knowing that He is risen! 实际输入的目标文本(切句后): As a pastor, I would like to share this message of hope and faith with you on this glorious Easter day. On this blessed day of resurrection, let us remember the triumph of light over darkness, love over hatred, and hope over despair. Just as Jesus conquered death, let us embrace the promise of new beginnings and renewed faith in our hearts. May the spirit of Easter fill your life with joy, peace, and endless possibilities. Let us rejoice in the gift of salvation and the eternal love that surrounds us all. Have a blessed and joyous Easter, knowing that He is risen!

实际输入的目标文本(每句): As a pastor, before inferencing, data length =0, time=0.0 ms 前端处理后的文本(每句): As a pastor, 4%|▍ | 66/1500 [00:01<00:28, 50.90it/s] T2S Decoding EOS [214 -> 281] added one sub items, start=0.0, stop=2900.0, text=As a pastor, after inferencing, data length =92800, time=2900.0 ms 实际输入的目标文本(每句): I would like to share this message of hope and faith with you on this glorious Easter day. before inferencing, data length =92800, time=2900.0 ms 前端处理后的文本(每句): I would like to share this message of hope and faith with you on this glorious Easter day. 9%|▉ | 142/1500 [00:02<00:26, 51.66it/s] T2S Decoding EOS [214 -> 357] added one sub items, start=2900.0, stop=8840.0, text= I would like to share this message of hope and faith with you on this glorious Easter day. after inferencing, data length =282880, time=8840.0 ms 实际输入的目标文本(每句): On this blessed day of resurrection, before inferencing, data length =282880, time=8840.0 ms 前端处理后的文本(每句): On this blessed day of resurrection, 6%|▌ | 84/1500 [00:01<00:27, 51.38it/s] T2S Decoding EOS [214 -> 299] added one sub items, start=8840.0, stop=12460.0, text= On this blessed day of resurrection, after inferencing, data length =398720, time=12460.0 ms 实际输入的目标文本(每句): let us remember the triumph of light over darkness, before inferencing, data length =398720, time=12460.0 ms 前端处理后的文本(每句): let us remember the triumph of light over darkness, 7%|▋ | 103/1500 [00:01<00:27, 51.72it/s] T2S Decoding EOS [214 -> 318] added one sub items, start=12460.0, stop=16840.0, text= let us remember the triumph of light over darkness, after inferencing, data length =538880, time=16840.0 ms 实际输入的目标文本(每句): love over hatred, before inferencing, data length =538880, time=16840.0 ms 前端处理后的文本(每句): love over hatred, 5%|▌ | 77/1500 [00:01<00:27, 52.16it/s] T2S Decoding EOS [214 -> 292] added one sub items, start=16840.0, stop=20180.0, text= love over hatred, after inferencing, data length =645760, time=20180.0 ms 实际输入的目标文本(每句): and hope over despair. before inferencing, data length =645760, time=20180.0 ms 前端处理后的文本(每句): and hope over despair. 5%|▌ | 78/1500 [00:01<00:27, 51.98it/s] T2S Decoding EOS [214 -> 293] added one sub items, start=20180.0, stop=23560.0, text= and hope over despair. after inferencing, data length =753920, time=23560.0 ms 实际输入的目标文本(每句): Just as Jesus conquered death, before inferencing, data length =753920, time=23560.0 ms 前端处理后的文本(每句): Just as Jesus conquered death, 8%|▊ | 118/1500 [00:02<00:26, 52.26it/s] T2S Decoding EOS [214 -> 333] added one sub items, start=23560.0, stop=28540.0, text= Just as Jesus conquered death, after inferencing, data length =913280, time=28540.0 ms 实际输入的目标文本(每句): let us embrace the promise of new beginnings and renewed faith in our hearts. before inferencing, data length =913280, time=28540.0 ms 前端处理后的文本(每句): let us embrace the promise of new beginnings and renewed faith in our hearts. 10%|▉ | 145/1500 [00:02<00:25, 52.48it/s] T2S Decoding EOS [214 -> 360] added one sub items, start=28540.0, stop=34600.0, text= let us embrace the promise of new beginnings and renewed faith in our hearts. after inferencing, data length =1107200, time=34600.0 ms 实际输入的目标文本(每句): May the spirit of Easter fill your life with joy, before inferencing, data length =1107200, time=34600.0 ms 前端处理后的文本(每句): May the spirit of Easter fill your life with joy, 7%|▋ | 103/1500 [00:01<00:26, 52.41it/s] T2S Decoding EOS [214 -> 318] added one sub items, start=34600.0, stop=38980.0, text= May the spirit of Easter fill your life with joy, after inferencing, data length =1247360, time=38980.0 ms 实际输入的目标文本(每句): peace, before inferencing, data length =1247360, time=38980.0 ms 前端处理后的文本(每句): peace, 5%|▌ | 80/1500 [00:01<00:27, 51.73it/s] T2S Decoding EOS [214 -> 295] added one sub items, start=38980.0, stop=42440.0, text= peace, after inferencing, data length =1358080, time=42440.0 ms 实际输入的目标文本(每句): and endless possibilities. before inferencing, data length =1358080, time=42440.0 ms 前端处理后的文本(每句): and endless possibilities. 6%|▌ | 86/1500 [00:01<00:27, 51.59it/s] T2S Decoding EOS [214 -> 301] added one sub items, start=42440.0, stop=46140.0, text= and endless possibilities. after inferencing, data length =1476480, time=46140.0 ms 实际输入的目标文本(每句): Let us rejoice in the gift of salvation and the eternal love that surrounds us all. before inferencing, data length =1476480, time=46140.0 ms 前端处理后的文本(每句): Let us rejoice in the gift of salvation and the eternal love that surrounds us all. 10%|█ | 156/1500 [00:02<00:25, 52.07it/s] T2S Decoding EOS [214 -> 371] added one sub items, start=46140.0, stop=52640.0, text= Let us rejoice in the gift of salvation and the eternal love that surrounds us all. after inferencing, data length =1684480, time=52640.0 ms 实际输入的目标文本(每句): Have a blessed and joyous Easter, before inferencing, data length =1684480, time=52640.0 ms 前端处理后的文本(每句): Have a blessed and joyous Easter, 6%|▌ | 93/1500 [00:01<00:27, 51.70it/s] T2S Decoding EOS [214 -> 308] added one sub items, start=52640.0, stop=56620.0, text= Have a blessed and joyous Easter, after inferencing, data length =1811840, time=56620.0 ms 实际输入的目标文本(每句): knowing that He is risen! before inferencing, data length =1811840, time=56620.0 ms 前端处理后的文本(每句): knowing that He is risen! 6%|▌ | 86/1500 [00:01<00:27, 51.21it/s]

2024-03-28 14:07:13,978 - INFO - root - vits_api.py - line:279 - succeeded in tts, voiceid=18, inferText=As a pastor, I would like to share this message of hope and faith with you on this glorious Easter day. On this blessed day of resurrection, let us remember the triumph of light over darkness, love over hatred, and hope over despair. Just as Jesus conquered death, let us embrace the promise of new beginnings and renewed faith in our hearts. May the spirit of Easter fill your life with joy, peace, and endless possibilities. Let us rejoice in the gift of salvation and the eternal love that surrounds us all. Have a blessed and joyous Easter, knowing that He is risen!,