Artrajz / vits-simple-api

A simple VITS HTTP API, developed by extending Moegoe with additional features.
GNU Affero General Public License v3.0
822 stars 121 forks source link

使用BertVITS2进行播放器生成时报错,无法获取音频数据 #113

Closed cat-undertale closed 7 months ago

cat-undertale commented 11 months ago

运行环境

问题描述

2023-12-08 15:38:55 [INFO] [views.voice_bert_vits2_api:421] [BERT-VITS2] len:10 text:你好我是数字人小助手 2023-12-08 15:38:55 [INFO] [langid.load_model:162] initializing identifier 2023-12-08 15:38:58 [ERROR] [app.log_exception:1744] Exception on /voice/bert-vits2 [POST] Traceback (most recent call last): File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app response = self.full_dispatch_request() File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request rv = self.handle_user_exception(e) File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request rv = self.dispatch_request() File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(*view_args) File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\auth.py", line 9, in check_api_key return func(args, **kwargs) File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\views.py", line 480, in voice_bert_vits2_api audio = tts_manager.bert_vits2_infer(state) File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\TTSManager.py", line 375, in bert_vits2_infer audio = model.infer(sentence, state["id"], lang, state["sdp_ratio"], state["noise"], File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 171, in infer emo = self.getemo(reference_audio, emotion).to(self.device).unsqueeze(0) File "E:\AIGC\VITS\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 161, in getemo get_emo(reference_audio, self.emotion_model, self.processor)) if reference_audio else torch.Tensor( TypeError: must be real number, not NoneType

问题复现步骤

解压部署包,把用别人的一键包训练的模型(非emo)拖入Model文件中,修改config.json后,进入界面未修改任何参数直接点击播放器生成

Artrajz commented 11 months ago

需要按照文档步骤载入模型,也可以进入admin后台加载模型。

cat-undertale commented 11 months ago

是按照文档步骤已经载入了模型,但是点生成无法获取音频数据 捕获 捕获2

Artrajz commented 11 months ago

不是v2.1的模型的话,那应该是没有在模型的配置文件config.json中添加版本号信息,没有版本号信息的模型都会被当成v2.1来加载。

AuraElicase commented 11 months ago

运行环境

问题描述

问题复现步骤

模型来源于Bert-VITS2t自己训练,config.json内有版本号,这是一个三语模型

image

根据文档启动了后台管理页面并且加载模型

image

根据其他issues #111 ,我已经重新安装了librosa==0.9.1用来解决TypeError: load() takes 1 positional argument but 2 were given 以及在 vits-simple-api-windows-gpu-v0.6.0\bert_vits2\emotional\wav2vec2-large-robust-12-ft-emotion-msp-dim 内放入了pytorch_model.bin 直到我遇到上述报错,搜索问题后没有找到解决方案,请问您能帮助我解决吗?万分感谢.

Artrajz commented 11 months ago

File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 161, in getemo get_emo(reference_audio, self.emotion_model, self.processor)) if reference_audio else torch.Tensor( TypeError: must be real number, not NoneType

报错中显示没有传入emotion或reference_audio,也就是说在web推理的时候要填写emotion或reference_audio。

这里确实有点不方便了,当时写的时候忘记给个默认值了。

RayenAlex commented 11 months ago

运行环境

  • 操作系统 (Linux/macOS/Windows):windows
  • 部署方式 (Docker/windows快速部署包/自己搭的环境):快速部署包
  • Python 版本 (如果是部署包可不填):
  • 代码版本/部署包版本: 0.6.0

问题描述

  • webui内弹窗: image
  • Cmd命令行:
2023-12-10 16:12:54 [INFO] [_internal._log:187] Press CTRL+C to quit
2023-12-10 16:13:52 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:13:52] "GET /voice/speakers HTTP/1.1" 200 -
2023-12-10 16:14:07 [INFO] [views.voice_vits_api:76] [VITS] id:0 format:wav lang:auto length:1 noise:0.33 noisew:0.4 segment_size:50
2023-12-10 16:14:07 [INFO] [views.voice_vits_api:78] [VITS] len:4 text:text
2023-12-10 16:14:07 [INFO] [views.voice_vits_api:89] [VITS] speaker id 0 does not exist
2023-12-10 16:14:07 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:07] "GET /voice/vits?text=text HTTP/1.1" 400 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET / HTTP/1.1" 200 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/css/plugins/bootstrap.min.css HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/css/style.css HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/js/index.js HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/css/fileinput.min.css HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/css/pages/index.css HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/js/plugins/fileinput.min.js HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/js/plugins/jquery-3.7.1.min.js HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /static/js/plugins/bootstrap.bundle.min.js HTTP/1.1" 304 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "GET /voice/speakers HTTP/1.1" 200 -
2023-12-10 16:14:29 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:29] "POST /voice/default_parameter HTTP/1.1" 200 -
2023-12-10 16:14:36 [INFO] [views.voice_bert_vits2_api:418] [BERT-VITS2] id:0 format:mp3 lang:auto length:1 noise:0.33 noisew:0.4 sdp_ratio:0.2 segment_size:50 length_zh:0 length_ja:0 length_en:0
2023-12-10 16:14:36 [INFO] [views.voice_bert_vits2_api:421] [BERT-VITS2] len:2 text:你好
2023-12-10 16:14:36 [INFO] [langid.load_model:162] initializing identifier
2023-12-10 16:14:41 [ERROR] [app.log_exception:1744] Exception on /voice/bert-vits2 [POST]
Traceback (most recent call last):
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\auth.py", line 9, in check_api_key
    return func(*args, **kwargs)
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\tts_app\voice_api\views.py", line 480, in voice_bert_vits2_api
    audio = tts_manager.bert_vits2_infer(state)
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\TTSManager.py", line 375, in bert_vits2_infer
    audio = model.infer(sentence, state["id"], lang, state["sdp_ratio"], state["noise"],
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 171, in infer
    emo = self.get_emo_(reference_audio, emotion).to(self.device).unsqueeze(0)
  File "D:\VITS\vits-simple-api-windows-gpu-v0.6.0\bert_vits2\bert_vits2.py", line 161, in get_emo_
    get_emo(reference_audio, self.emotion_model, self.processor)) if reference_audio else torch.Tensor(
TypeError: must be real number, not NoneType
2023-12-10 16:14:41 [INFO] [_internal._log:187] 127.0.0.1 - - [10/Dec/2023 16:14:41] "POST /voice/bert-vits2 HTTP/1.1" 500 -

问题复现步骤

模型来源于Bert-VITS2t自己训练,config.json内有版本号,这是一个三语模型

image

根据文档启动了后台管理页面并且加载模型

image

根据其他issues #111 ,我已经重新安装了librosa==0.9.1用来解决TypeError: load() takes 1 positional argument but 2 were given 以及在 vits-simple-api-windows-gpu-v0.6.0\bert_vits2\emotional\wav2vec2-large-robust-12-ft-emotion-msp-dim 内放入了pytorch_model.bin 直到我遇到上述报错,搜索问题后没有找到解决方案,请问您能帮助我解决吗?万分感谢.

我之前也遇到这个问题,我的解决办法是emtion 那里加个数字,比如改成emotion=1 ,就能正常返回语音了

AuraElicase commented 11 months ago

十分感谢,问题成功解决了