使用BertVITS2推理中英混合文本时报错，无法获取音频数据

anartigone commented 6 months ago

运行环境

操作系统 (Linux/macOS/Windows)：Linux
部署方式 (Docker/windows快速部署包/自己搭的环境)：Conda虚拟环境
Python 版本 (如果是部署包可不填)：python=3.10
代码版本/部署包版本: 2023.01.03 main branch

问题描述

模型用Bert-vits2-V2.3训练的中英双语模型，自带gradio中英混合文本生成正常。

使用vits-simple-api加载时会出现一些warning：


INFO:root:Loading yaml from /home/xxx/vits-simple-api/config.yml
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.915 seconds.
DEBUG:jieba:Loading model cost 0.915 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
2024-01-03 13:34:13 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v2-large-japanese-char-wwm
2024-01-03 13:34:19 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v2-large-japanese-char-wwm
2024-01-03 13:34:19 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large
/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:473: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
Some weights of DebertaV2ForMaskedLM were not initialized from the model checkpoint at /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large and are newly initialized: ['cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2024-01-03 13:34:28 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large
2024-01-03 13:34:28 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large
Some weights of the model checkpoint at /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']

This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 2024-01-03 13:34:36 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large /home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-01-03 13:34:42 [INFO] [utils.load_checkpoint:65] Loaded checkpoint '/home/xxx/vits-simple-api/Model/xxx/G_24750.pth' (iteration 188) 2024-01-03 13:34:42 [INFO] [ModelManager._load_model_from_path:229] model_type:BERT-VITS2 model_id:0 n_speakers:1 model_path:/home/xxx/vits-simple-api/Model/xxx/G_24750.pth
配置文件config.json内的版本为"version": "2.3"
中英双语训练的模型被识别成中日英三语模型，推理时中文正常，读到任何英文字母时webui都会报错“无法获取音频数据”，没有训练过的日语文本生成正常而且效果很不错。

英文报错日志：

2024-01-03 13:43:15 [ERROR] [app.log_exception:1744] Exception on /voice/bert-vits2 [POST]
Traceback (most recent call last):
File "/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/home/xxx/vits-simple-api/tts_app/voice_api/auth.py", line 9, in check_api_key
return func(*args, **kwargs)
File "/home/xxx/vits-simple-api/tts_app/voice_api/views.py", line 495, in voice_bert_vits2_api
audio = tts_manager.bert_vits2_infer(state)
File "/home/xxx/vits-simple-api/TTSManager.py", line 375, in bert_vits2_infer
audio = model.infer(sentence, state["id"], lang, state["sdp_ratio"], state["noise"],
File "/home/xxx/vits-simple-api/bert_vits2/bert_vits2.py", line 195, in infer
zh_bert, ja_bert, en_bert, phones, tones, lang_ids = self.get_text(text, lang, self.hps_ms, style_text,
File "/home/xxx/vits-simple-api/bert_vits2/bert_vits2.py", line 153, in get_text
bert = self.model_handler.get_bert_feature(norm_text, word2ph, bert_feature_lang_str,
File "/home/xxx/vits-simple-api/bert_vits2/model_handler.py", line 197, in get_bert_feature
bert_feature = self.lang_bert_func_map[language](norm_text, word2ph, tokenizer, model, self.device, style_text,
TypeError: get_bert_feature() takes from 4 to 5 positional arguments but 7 were given

已参考其他issues#111 #113，没能解决。我怀疑是2.3兼容问题，在考虑退到2.1或者更可靠的BertVITS2版本重新训练。

Artrajz commented 6 months ago

已修复https://github.com/Artrajz/vits-simple-api/commit/a3b6f442ecbcaccf5ded1a1245ca37f1bc030b2e

是我的问题 😭 最近老是犯低级错误

anartigone commented 6 months ago

已修复a3b6f44

是我的问题 😭 最近老是犯低级错误

太棒了！神速修复🥰

Artrajz / vits-simple-api

使用BertVITS2推理中英混合文本时报错，无法获取音频数据 #122

运行环境

问题描述