bert-vits2模型无法正确推理的问题

yuffiesakiya commented 11 months ago

运行环境

操作系统 (Linux/macOS/Windows)：win11
部署方式 (Docker/windows快速部署包/自己搭的环境)：部署包
Python 版本 (如果是部署包可不填)：
代码版本/部署包版本:0.6.0最新版

问题描述

加载bert-vits2模型后，可以正确推理出角色的声线，但存在咬词不清、咬文模糊的问题。（模型在原仓库推理没问题）模型地址：https://huggingface.co/spaces/XzJosh/Wenjing-Bert-VITS2/tree/main/logs/Azuma

问题复现步骤

INFO:root:Loading yaml from C:\Users\LV HENG\Desktop\vits-simple-api1205\config.yml Building prefix dict from the default dictionary ... DEBUG:jieba:Building prefix dict from the default dictionary ... Loading model from cache C:\Users\LVHENG~1\AppData\Local\Temp\jieba.cache DEBUG:jieba:Loading model from cache C:\Users\LVHENG~1\AppData\Local\Temp\jieba.cache Loading model cost 0.430 seconds. DEBUG:jieba:Loading model cost 0.430 seconds. Prefix dict has been built successfully. DEBUG:jieba:Prefix dict has been built successfully. 2023-12-06 22:19:15 [INFO] [bert_handler.load_bert:93] Loading BERT model: C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3 2023-12-06 22:19:15 [ERROR] [bert_handler.load_bert:101] Failed loading C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3. You need to install fugashi to use MecabTokenizer. See https://pypi.org/project/fugashi/ for installation. 2023-12-06 22:19:15 [INFO] [bert_handler.load_bert:102] Trying to download. 2023-12-06 22:19:16 [INFO] [bert_handler._download_model:86] File already exists and verified successfully! 2023-12-06 22:19:16 [INFO] [bert_handler.load_bert:93] Loading BERT model: C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3 2023-12-06 22:19:16 [ERROR] [bert_handler.load_bert:101] Failed loading C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3. You need to install fugashi to use MecabTokenizer. See https://pypi.org/project/fugashi/ for installation. 2023-12-06 22:19:16 [INFO] [bert_handler.load_bert:102] Trying to download. 2023-12-06 22:19:17 [INFO] [bert_handler._download_model:86] File already exists and verified successfully! 2023-12-06 22:19:17 [INFO] [bert_handler.load_bert:93] Loading BERT model: C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3 2023-12-06 22:19:17 [ERROR] [bert_handler.load_bert:101] Failed loading C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3. You need to install fugashi to use MecabTokenizer. See https://pypi.org/project/fugashi/ for installation. 2023-12-06 22:19:17 [INFO] [bert_handler.load_bert:102] Trying to download. 2023-12-06 22:19:17 [INFO] [bert_handler._download_model:86] File already exists and verified successfully! 2023-12-06 22:19:17 [ERROR] [bert_handler.load_bert:109] Failed to load C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/bert-base-japanese-v3 after 3 retries. 2023-12-06 22:19:17 [INFO] [bert_handler.load_bert:93] Loading BERT model: C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/chinese-roberta-wwm-ext-large Some weights of the model checkpoint at C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/chinese-roberta-wwm-ext-large were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'cls.seq_relationship.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.weight']

This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 2023-12-06 22:19:23 [INFO] [bert_handler.load_bert:98] Success loading: C:\Users\LV HENG\Desktop\vits-simple-api1205\bert_vits2/bert/chinese-roberta-wwm-ext-large 2023-12-06 22:19:24 [ERROR] [utils.load_checkpoint:57] enc_p.emb.weight is not in the checkpoint 2023-12-06 22:19:27 [INFO] [utils.load_checkpoint:65] Loaded checkpoint 'C:\Users\LV HENG\Desktop\vits-simple-api1205\Model\azuma\G_17400.pth' (iteration 76) 2023-12-06 22:19:27 [INFO] [ModelManager._load_model_from_path:234] model_type:BERT-VITS2 model_id:0 n_speakers:1 model_path:C:\Users\LV HENG\Desktop\vits-simple-api1205\Model\azuma\G_17400.pth 2023-12-06 22:19:27 [INFO] [ModelManager.log_device_info:146] PyTorch Version: 1.13.1+cu117 Cuda available:True Device type:cuda 2023-12-06 22:19:27 [INFO] [ModelManager.log_device_info:151] Using GPU on NVIDIA GeForce RTX 4060 Laptop GPU, GPU Device Index: None 2023-12-06 22:19:27 [INFO] [ModelManager.model_init:97] [BERT-VITS2] 1 speakers 2023-12-06 22:19:27 [INFO] [ModelManager.model_init:99] 1 speakers in total. 2023-12-06 22:19:27 [INFO] [phrases_dict.phrases_dict_init:30] Loading phrases_dict 2023-12-06 22:19:28 [WARNING] [phrases_dict.load_phrases_from_file:24] File C:\Users\LV HENG\Desktop\vits-simple-api1205/phrases_dict.txt not found. You can create C:\Users\LV HENG\Desktop\vits-simple-api1205/phrases_dict.txt and write your phrases_dict. 2023-12-06 22:19:28 [DEBUG] [win32._get_localzone_name:58] Looking up time zone info from registry 2023-12-06 22:19:28 [INFO] [base.start:181] Scheduler started 2023-12-06 22:19:28 [INFO] [base._real_add_job:895] Added job "clean_task" to job store "default"
- Serving Flask app 'app'
- Debug mode: off 2023-12-06 22:19:28 [INFO] [_internal._log:187] WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
- Running on all addresses (0.0.0.0)
- Running on http://127.0.0.1:23456
- Running on http://192.168.1.3:23456 2023-12-06 22:19:28 [INFO] [_internal._log:187] Press CTRL+C to quit

目前主要怀疑错误有： 1.bert-base-japanese-v3未能正确加载（但好像不影响中文推理？）。 2.模型版本兼容性问题：刚开始推理出的结果是电流杂音，在config里添加"version": "1.1.0-transition"代码后，正确推理出人物音色，但仍然存在咬词不清的问题。 3.[ERROR] [utils.load_checkpoint:57] enc_p.emb.weight is not in the checkpoint 怀疑是这条影响了最终的推理结果。

烦请大佬赐教，不胜感激。另：大佬有空可否考虑做一个纯bert-vits2的api接口？个人感觉bert-vits2到综合表现确实不错，非常感谢！

Artrajz commented 11 months ago

1.bert-base-japanese-v3未能正确加载（但好像不影响中文推理？）。

可能是不小心动了pth之外的文件，我之前复制模型时不小心多复制了一些文件就出现了和你一样的错误

2.模型版本兼容性问题：刚开始推理出的结果是电流杂音，在config里添加"version": "1.1.0-transition"代码后，正确推理出人物音色，但仍然存在咬词不清的问题。

观察了一下这个模型版本应该是1.0.1

3.[ERROR] [utils.load_checkpoint:57] enc_p.emb.weight is not in the checkpoint

我下载这个模型，并加载该模型后没有遇到这个错误，可能模型没下完整。

大佬有空可否考虑做一个纯bert-vits2的api接口？

不是很理解，在这个项目里你可以只加载bert-vits2，也可以只使用bert-vits2的api接口，其他的接口并不影响bert-vits2的使用，开销也并不大。如果只是为了精简代码，可以自行把bert-vits2之外的代码删除。

yuffiesakiya commented 11 months ago

大佬你下载这个模型之后不添加版本号直接推理没问题吗？这个模型只有一个文件应该不会没下完整啊。另：在哪里查看模型版本啊？

Artrajz commented 11 months ago

大佬你下载这个模型之后不添加版本号直接推理没问题吗？

除了Bert-VITS2当前的最新版本，所有旧版本都需要添加版本号。

这个模型只有一个文件应该不会没下完整啊。

我只是猜测并提供一个检查的方向，也可能是其他的问题。

在哪里查看模型版本啊？

看models.py和text/symbols.py里面的代码即可得知，需要对各个版本的差异有一定了解才能看出来。对于这个版本，更直接的办法是，打开bert文件夹发现里面只有一个chinese-roberta-wwm-ext-large，只有1.0.1以前的版本是只有一个bert模型的

Artrajz / vits-simple-api