RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
MIT License
33.91k stars 3.89k forks source link

Mac M1合成出来的语音没声音,只听到两声吸气 #309

Closed fedavinci closed 8 months ago

fedavinci commented 8 months ago

开启TTS推理WebUI时报错,说找不到 FFmpeg extension

"/Users/taoxu/miniconda3/envs/GPTSoVits/bin/python" GPT_SoVITS/inference_webui.py
DEBUG:torio._extension.utils:Loading FFmpeg6
DEBUG:torio._extension.utils:Failed to load FFmpeg6 extension.
Traceback (most recent call last):
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
    ctypes.CDLL(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg6.so, 0x0006): Library not loaded: @rpath/libavutil.58.dylib
  Referenced from: <47D7ABF2-086E-3080-BD43-088B7CE5B6B3> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg6.so
  Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.58.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.58.dylib' (no such file), '/usr/local/lib/libavutil.58.dylib' (no such file), '/usr/lib/libavutil.58.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg5
DEBUG:torio._extension.utils:Failed to load FFmpeg5 extension.
Traceback (most recent call last):
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
    ctypes.CDLL(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg5.so, 0x0006): Library not loaded: @rpath/libavutil.57.dylib
  Referenced from: <3ED882E0-A742-36B7-B54D-9D6FC74461A3> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg5.so
  Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.57.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.57.dylib' (no such file), '/usr/local/lib/libavutil.57.dylib' (no such file), '/usr/lib/libavutil.57.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg4
DEBUG:torio._extension.utils:Failed to load FFmpeg4 extension.
Traceback (most recent call last):
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 108, in _find_versionsed_ffmpeg_extension
    _load_lib(lib)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 94, in _load_lib
    torch.ops.load_library(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/_ops.py", line 1003, in load_library
    ctypes.CDLL(path)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so, 0x0006): Library not loaded: @rpath/libavutil.56.dylib
  Referenced from: <0F44C7E0-FB42-3737-9603-D52E5202730D> /Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/lib/libtorio_ffmpeg4.so
  Reason: tried: '/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/lib-dynload/../../libavutil.56.dylib' (no such file), '/Users/taoxu/miniconda3/envs/GPTSoVits/bin/../lib/libavutil.56.dylib' (no such file), '/usr/local/lib/libavutil.56.dylib' (no such file), '/usr/lib/libavutil.56.dylib' (no such file, not in dyld cache)
DEBUG:torio._extension.utils:Loading FFmpeg
DEBUG:torio._extension.utils:Failed to load FFmpeg extension.
Traceback (most recent call last):
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 116, in _find_ffmpeg_extension
    ext = _find_versionsed_ffmpeg_extension(ffmpeg_ver)
  File "/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torio/_extension/utils.py", line 106, in _find_versionsed_ffmpeg_extension
    raise RuntimeError(f"FFmpeg{version} extension is not available.")
RuntimeError: FFmpeg extension is not available.
Some weights of the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base were not used when initializing HubertModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing HubertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing HubertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of HubertModel were not initialized from the model checkpoint at GPT_SoVITS/pretrained_models/chinese-hubert-base and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
<All keys matched successfully>
Number of parameter: 77.49M
Running on local URL:  http://0.0.0.0:9872

接着合成的音频没声音

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
 14%|████████████▋                                                                                | 204/1500 [00:53<08:42,  2.48it/s]T2S Decoding EOS [128 -> 332]
 14%|████████████▋                                                                                | 204/1500 [00:53<05:39,  3.81it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
  return torch._C._nn.pad(input, pad, mode, value)
1.151   0.569   53.598  7.114
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

修改TOKENIZERS_PARALLELISM为false,再合成还是没声音

export TOKENIZERS_PARALLELISM=false
python webui.py

 13%|███████████▋                                                                                 | 188/1500 [00:45<08:51,  2.47it/s]T2S Decoding EOS [128 -> 316]
 13%|███████████▋                                                                                 | 188/1500 [00:46<05:21,  4.08it/s]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
  return torch._C._nn.pad(input, pad, mode, value)
1.581   0.717   46.118  6.470
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])

修改TOKENIZERS_PARALLELISM为true,再合成依旧没声音

export TOKENIZERS_PARALLELISM=true
python webui.py

 13%|███████████▊                                                                                 | 191/1500 [00:46<09:08,  2.38it/s]T2S Decoding EOS [128 -> 319]
 13%|███████████▊                                                                                 | 191/1500 [00:46<05:21,  4.07it/s]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/functional.py:660: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/SpectralOps.cpp:879.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/Users/taoxu/miniconda3/envs/GPTSoVits/lib/python3.9/site-packages/torch/nn/functional.py:4522: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:474.)
  return torch._C._nn.pad(input, pad, mode, value)
1.423   0.718   46.951  6.457
DEBUG:pydub.converter:subprocess.call(['ffmpeg', '-y', '-i', '/var/folders/f3/n23rr3s558x0_hkct1m9w9p80000gn/T/gradio/b75baba4e0d5853d71c6d0dfae28ea864ba89580/参考音频.wav', '-acodec', 'pcm_s16le', '-vn', '-f', 'wav', '-'])
fedavinci commented 8 months ago

把根目录的config.py的is_half

is_half = eval(os.environ.get("is_half","True"))

改成

is_half = eval(os.environ.get("is_half","False"))

再运行 python webui.py合成还是没声音

Lion-Wu commented 8 months ago

ffmpeg的这个报错应该不影响,报错的torio是哪里的我也没搞清楚,从来没见过,通过brew安装了ffmpeg应该就可以了。 合成音频的问题试试其他模型?生成不同的内容再试试?

hyhuc0079 commented 8 months ago

你两个模型选的对吗?没有选成另一个训练的模型吧?我一般气声是选错模型

wang1011945095 commented 8 months ago

哪位mac的大佬指导一下啊 十分感谢

fedavinci commented 8 months ago

ffmpeg的这个报错应该不影响,报错的torio是哪里的我也没搞清楚,从来没见过,通过brew安装了ffmpeg应该就可以了。 合成音频的问题试试其他模型?生成不同的内容再试试?

brew和conda都装了ffmpeg 模型随便怎么切 生成的8秒音频只有吸气的声音

fedavinci commented 8 months ago

你两个模型选的对吗?没有选成另一个训练的模型吧?我一般气声是选错模型

选的就是我训练出来的模型 没有选默认的

Lion-Wu commented 8 months ago

那你试试不同epoch的模型,稍微低一点的

ccjackcong commented 8 months ago

有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。

ccjackcong commented 8 months ago

有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。

刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?

fedavinci commented 8 months ago

有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。

刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?

我的Mac是苹果芯片,你的是Intel芯片吗

ccjackcong commented 8 months ago

有mac训练出较好的模型了吗?大佬给分享下经验吧,目前只最小epoch的有人声,但不声音不好。

刚才在colab里使用原来的素材,用默认参数,训练了一下,可以正常推理,下载到本地也正常推理。除了语速稍快,其他还好。看来是本地训练某个环节有问题,能不能是我们下载的那个底模,不支持苹果芯片的原因?

我的Mac是苹果芯片,你的是Intel芯片吗

m2芯片,本地训练多次尝试都是失败,只有呼噜声。有人说可能是我们下载的底模不支持苹果芯片,所以才会这样。colab训练后的模型下载下来,本地可以推理。