Huanshere / VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
https://docs.videolingo.io
Apache License 2.0
7.19k stars 690 forks source link

GPT-SoVITS 语音总是使用的默认的模型 #230

Closed ktmswzw closed 3 weeks ago

ktmswzw commented 3 weeks ago

GPT-SoVITS 语音总是使用的默认的模型

验证模型: GPT-SoVITS-v2-240821>runtime\python.exe webui.py zh_CN 语音推理正常 Number of parameter: 77.61M Number of parameter: 77.61M

实际输入的参考文本: 你好我是娜酱,来自绝区零,是一个游戏角色,欢迎来到未来世界。 实际输入的目标文本: 你好,我要一个小灯牌,快来打赏 实际输入的目标文本(切句后): 你好,我要一个小灯牌,快来打赏 当前使用g2pw进行拼音推理 Building prefix dict from the default dictionary ... DEBUG:jieba_fast:Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ktmsw\Downloads\code\GPT-SoVITS-v2-240821\TEMP\jieba.cache DEBUG:jieba_fast:Loading model from cache C:\Users\ktmsw\Downloads\code\GPT-SoVITS-v2-240821\TEMP\jieba.cache Loading model cost 0.271 seconds. DEBUG:jieba_fast:Loading model cost 0.271 seconds. Prefix dict has been built succesfully. DEBUG:jieba_fast:Prefix dict has been built succesfully. 实际输入的目标文本(每句): 你好,我要一个小灯牌,快来打赏。 前端处理后的文本(每句): 你好,我要一个小灯牌,快来打赏. 5%|████▏ | 79/1500 [00:01<00:13, 104.93it/s]T2S Decoding EOS [183 -> 264] 5%|████▎ | 80/1500 [00:01<00:18, 76.45it/s] 0.972 1.608 1.076 0.787 videolingo的调用也正常,以下是显示配置读取正常 ---------------------------------------------TTS Config--------------------------------------------- device : cuda is_half : True version : v2 t2s_weights_path : GPT_weights_v2/nana-e15.ckpt vits_weights_path : SoVITS_weights_v2/nana_e8_s136.pth bert_base_path : GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large cnhuhbert_base_path : GPT_SoVITS/pretrained_models/chinese-hubert-base ---------------------------------------------------------------------------------------------------- Loading Text2Semantic weights from GPT_weights_v2/nana-e15.ckpt Loading VITS weights from SoVITS_weights_v2/nana_e8_s136.pth Loading BERT weights from GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large Loading CNHuBERT weights from GPT_SoVITS/pretrained_models/chinese-hubert-base INFO: Started server process [18800] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:9880 (Press CTRL+C to quit) INFO: 127.0.0.1:52036 - "GET /ping HTTP/1.1" 404 Not Found INFO: 127.0.0.1:52039 - "GET /ping HTTP/1.1" 404 Not Found INFO: 127.0.0.1:52042 - "GET /ping HTTP/1.1" 404 Not Found INFO: 127.0.0.1:52045 - "GET /ping HTTP/1.1" 404 Not Found Set seed to 3604113364 并行推理模式已开启 分桶处理模式已开启 ================================== 最终产生的视频声音还是默认的男生,没有使用配置的nana-e15.ckpt模型
lonrencn commented 3 weeks ago

你有没有选择模式3,用每一句的声音做参考?

ktmswzw commented 3 weeks ago

你有没有选择模式3,用每一句的声音做参考?

默认选了模式3

Huanshere commented 3 weeks ago

检查 config 配置并删除原模型试试,请尝试自行 debug

ktmswzw commented 3 weeks ago

这个接口 http://127.0.0.1:9880/ping在新版本的 GPT-SoVITS-v2 没有,需要手工加入 GPT-SoVITS-v2-240821/api_v2.py @APP.get("/ping") async def ping(command: str = None): return JSONResponse(status_code=200, content={"message": f"tts success"})

ktmswzw commented 3 weeks ago

你有没有选择模式3,用每一句的声音做参考?

要选择提供参考声音,是1,而不是是3,测试完成