推理完成后，无法获得整合的声音。

changhefirst commented 10 months ago

问题描述：

推理出来的声音文件没有音频，一条直线。我找不到tts分段音频的位置，不知道具体是否分段推理成功，但终端显示和autodl上推理的时候一样。最后得到的wav，长度是对的，但是没有声音，里边是一条直线。

我在autodl上，已经顺利运行镜像，并且生成微调，成功完成推理。因为看到推理过程占用3080的最多30%算力，大约2g vram，于是就想拉到本地进行推理。

本地环境：

win10 ， NVIDIA-SMI 537.70 Driver Version: 537.70 CUDA Version: 12.2， NVIDIA T600， 4g vram。Python 3.10.8。

过程和报错

在提issue之前，我zip下了最新的代码，并且替换了prezip目录的同名文件。问题依旧，只是显示的报错不同。下边的报错，都是最新代码的情况。

运行go-webui.bat，顺利打开9874 webui。我没尝试继续微调操作，直接去微调部分开TTS的webui。

`runtime\python.exe webui.py Running on local URL: http://0.0.0.0:9874 "E:\GPT-SoVITS\runtime\python.exe" GPT_SoVITS/inference_webui.py

Number of parameter: 77.49M DEBUG:root:Using proactor: IocpProactor DEBUG:root:Using proactor: IocpProactor Running on local URL: http://0.0.0.0:9872 Number of parameter: 77.49M Building prefix dict from the default dictionary ... DEBUG:jieba_fast:Building prefix dict from the default dictionary ... Loading model from cache E:\GPT-SoVITS\TEMP\jieba.cache DEBUG:jieba_fast:Loading model from cache E:\GPT-SoVITS\TEMP\jieba.cache Loading model cost 0.864 seconds. DEBUG:jieba_fast:Loading model cost 0.864 seconds. Prefix dict has been built succesfully. DEBUG:jieba_fast:Prefix dict has been built succesfully.` 至此一切正常。开始tts推理 > ['零样本文本到语音，', ' ', '输入5秒的声音样本，即刻体验文本到语音转换。'] ['zh', 'en', 'zh'] 8%|██████▌ | 124/1500 [00:04<00:41, 33.40it/s]T2S Decoding EOS [162 -> 286] 8%|██████▌ | 124/1500 [00:04<00:45, 30.30it/s] E:\GPT-SoVITS\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] [] ['少样本，', ' ', '仅需1分钟的训练数据即可微调模型，提升声音相似度和真实感。'] ['zh', 'en', 'zh'] 14%|███████████▍ | 217/1500 [00:06<00:38, 33.09it/s]T2S Decoding EOS [162 -> 380] 15%|███████████▍ | 218/1500 [00:06<00:40, 31.83it/s] [] ['跨语言支持：', ' ', '支持与训练数据集不同语言的推理，目前支持英语、日语和中文。'] ['zh', 'en', 'zh'] 9%|███████▏ | 137/1500 [00:04<00:42, 32.39it/s]T2S Decoding EOS [162 -> 302] 9%|███████▎ | 140/1500 [00:04<00:44, 30.76it/s] [] ['工具：', ' ', '集成工具包括声音伴奏分离、自动训练集分割、中文自动语音识别和文本标注，协助初学者创建训练数据集和模型。'] ['zh', 'en', 'zh'] 17%|█████████████▎ | 253/1500 [00:07<00:37, 33.08it/s]T2S Decoding EOS [162 -> 416] 17%|█████████████▍ | 254/1500 [00:07<00:39, 31.80it/s] 2.322 21.445 7.990 1.418 ERROR:root:Exception in callback _ProactorBasePipeTransport._call_connection_lost(None) handle: Traceback (most recent call last): File "asyncio\events.py", line 80, in _run File "asyncio\proactor_events.py", line 162, in _call_connection_lost ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。出现报错，返回的wav文件长度足够，但是没有声音，一条直线。我开始以为这是因为使用多段落的问题，于是尝试只推理一句。 > ['零样本文本到语音，', ' ', '输入5秒的声音样本，即刻体验文本到语音转换。'] ['zh', 'en', 'zh'] 8%|██████▌ | 125/1500 [00:03<00:40, 33.56it/s]T2S Decoding EOS [162 -> 287] 8%|██████▌ | 125/1500 [00:04<00:44, 31.04it/s] 1.211 0.371 4.030 0.806 没有报错，到这里就没有终端输出了。但是出来的wav仍然是一条直线，没声音。之前用prezip中的代码的时候，会有报告torch找不到ffmpeg，没影响运行。可perzip里是包含了ffmpeg，而且我机器命令行，可以调出path里的ffmpeg。换最新的代码后，就没这个报错了。

changhefirst commented 10 months ago

ps，logs里没log。。。

RVC-Boss commented 10 months ago

T600可能不支持半精度推理。config.py里面把is_half=True改成False试试呢？

changhefirst commented 10 months ago

感谢，问题解决了。

就是这个设置。

T600的确不支持half。之前发现docker可以设置这个项目，还特意找了，折腾4个小时，结果在这个文件里呢。使劲拍自己脑门一下。

ps，ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。这个报错还有，但已经不影响使用了。

RVC-Boss / GPT-SoVITS