Artrajz / vits-simple-api

A simple VITS HTTP API, developed by extending Moegoe with additional features.
GNU Affero General Public License v3.0
781 stars 117 forks source link

我无法使用2.1版本的vits2模型 #111

Closed ArackLiceve closed 5 months ago

ArackLiceve commented 9 months ago

运行环境

问题描述

Artrajz commented 9 months ago

可以看一下日志中模型加载错误的报错吗?

ArackLiceve commented 9 months ago

你需要的因该是这个模型加载报错的日志

2023-12-03 22:42:32 [INFO] [_internal._log:187] WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

2023-12-03 22:44:45 [INFO] [_internal._log:187] 192.168.42.3 - - [03/Dec/2023 22:44:45] "POST /admin/load_model HTTP/1.1" 500 -

Artrajz commented 9 months ago

日志中说缺少wav2vec2-large-robust-12-ft-emotion-msp-dim模型,你可以在这里下载 下载pytorch_model.bin后放入bert_vits2/emotional/wav2vec2-large-robust-12-ft-emotion-msp-dim文件夹中

Ikaros-521 commented 9 months ago

借楼,首次加载v2.1的模型,报错

File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\get_emo.py", line 83, in get_emo
    wav, sr = librosa.load(audio, 16000)
TypeError: load() takes 1 positional argument but 2 were given

修改后bert_vits2\get_emo.py正常运行

wav, sr = librosa.load(audio, sr=16000)

加载音频第一次成功,后面第二次推理时报错,加载失败

2023-12-07 00:19:13 [ERROR] [app.log_exception:1744] Exception on /voice/bert-vits2 [POST]
Traceback (most recent call last):
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\auth.py", line 9, in check_api_key
    return func(*args, **kwargs)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\views.py", line 480, in voice_bert_vits2_api
    audio = tts_manager.bert_vits2_infer(state)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\TTSManager.py", line 375, in bert_vits2_infer
    audio = model.infer(sentence, state["id"], lang, state["sdp_ratio"], state["noise"],
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 171, in infer
    emo = self.get_emo_(reference_audio, emotion).to(self.device).unsqueeze(0)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 161, in get_emo_
    get_emo(reference_audio, self.emotion_model, self.processor)) if reference_audio else torch.Tensor(
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\get_emo.py", line 84, in get_emo
    wav, sr = librosa.load(audio, sr=16000)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\librosa\core\audio.py", line 185, in load
    raise exc
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\librosa\core\audio.py", line 175, in load
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\librosa\core\audio.py", line 208, in __soundfile_load
    context = sf.SoundFile(path)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening <FileStorage: '1.wav' ('audio/wav')>: Format not recognised.
2023-12-07 00:19:13 [INFO] [_internal._log:187] 127.0.0.1 - - [07/Dec/2023 00:19:13] "POST /voice/bert-vits2 HTTP/1.1" 500 -
Artrajz commented 9 months ago

File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\get_emo.py", line 83, in get_emo wav, sr = librosa.load(audio, 16000) TypeError: load() takes 1 positional argument but 2 were given

应该是librosa的版本问题,requirements.txt里没有指定版本,所以部署包里自动下载了最新版本。部署包里更换版本:

py310\python.exe -m pip install librosa==0.9.1

soundfile.LibsndfileError: Error opening <FileStorage: '1.wav' ('audio/wav')>: Format not recognised.

不清楚是什么问题,我这里不管是第几次推理,加载上传的音频时都是正常的。可能也是librosa的版本问题。

Ikaros-521 commented 9 months ago

File "D:\aizb\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\get_emo.py", line 83, in get_emo wav, sr = librosa.load(audio, 16000) TypeError: load() takes 1 positional argument but 2 were given

应该是librosa的版本问题,requirements.txt里没有指定版本,所以部署包里自动下载了最新版本。部署包里更换版本:

py310\python.exe -m pip install librosa==0.9.1

soundfile.LibsndfileError: Error opening <FileStorage: '1.wav' ('audio/wav')>: Format not recognised.

不清楚是什么问题,我这里不管是第几次推理,加载上传的音频时都是正常的。可能也是librosa的版本问题。

ok 值得一试🥰