Artrajz / vits-simple-api

A simple VITS HTTP API, developed by extending Moegoe with additional features.
GNU Affero General Public License v3.0
807 stars 119 forks source link

Bert-VITS2 底模 load 报错:`emb_g.weight is not in the checkpoint` #100

Closed JKCU2014 closed 11 months ago

JKCU2014 commented 11 months ago

运行环境

问题描述

Bert-VITS2中日英底模-fix 下载的最新底模,config.json 文件找的 Bert-VITS2 repo 里的 config.json

模型文件夹和配置文件均放在 Model 文件夹下,load 模型时会报错:

emb_g.weight is not in the checkpoint

尽管能生成声音,但声音会有些奇怪,而且声音跟人物有些对不上。

更详细的日志上下文:

vits-simple-api-vits-1  | 2023-11-22 11:41:51 [INFO] [bert_handler.load_bert:84] Loading BERT model: /app/bert_vits2/bert/deberta-v2-large-japanese
vits-simple-api-vits-1  | 2023-11-22 11:41:56 [INFO] [bert_handler.load_bert:89] Success loading: /app/bert_vits2/bert/deberta-v2-large-japanese
vits-simple-api-vits-1  | 2023-11-22 11:41:56 [INFO] [bert_handler.load_bert:84] Loading BERT model: /app/bert_vits2/bert/deberta-v3-large
vits-simple-api-vits-1  | /usr/local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:473: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
vits-simple-api-vits-1  |   warnings.warn(
vits-simple-api-vits-1  | Some weights of DebertaV2ForMaskedLM were not initialized from the model checkpoint at /app/bert_vits2/bert/deberta-v3-large and are newly initialized: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
vits-simple-api-vits-1  | You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
vits-simple-api-vits-1  | 2023-11-22 11:42:02 [INFO] [bert_handler.load_bert:89] Success loading: /app/bert_vits2/bert/deberta-v3-large
vits-simple-api-vits-1  | 2023-11-22 11:42:02 [INFO] [bert_handler.load_bert:84] Loading BERT model: /app/bert_vits2/bert/chinese-roberta-wwm-ext-large
vits-simple-api-vits-1  | Some weights of the model checkpoint at /app/bert_vits2/bert/chinese-roberta-wwm-ext-large were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.bias']
vits-simple-api-vits-1  | - This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
vits-simple-api-vits-1  | - This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
vits-simple-api-vits-1  | 2023-11-22 11:42:06 [INFO] [bert_handler.load_bert:89] Success loading: /app/bert_vits2/bert/chinese-roberta-wwm-ext-large
vits-simple-api-vits-1  | 2023-11-22 11:42:07 [ERROR] [utils.load_checkpoint:57] emb_g.weight is not in the checkpoint
vits-simple-api-vits-1  | 2023-11-22 11:42:07 [INFO] [utils.load_checkpoint:65] Loaded checkpoint '/app/Model/bert_vits2/Bert-VITS2-ZH-JP-EN_20231110/G_0.pth' (iteration 0)
vits-simple-api-vits-1  | 2023-11-22 11:42:07 [INFO] [ModelManager._load_model_from_path:218] model_type:BERT-VITS2 model_id:0 n_speakers:666 model_path:/app/Model/bert_vits2/Bert-VITS2-ZH-JP-EN_20231110/G_0.pth

问题复现步骤

按照 https://github.com/JKCU2014/vits-simple-api#step-1-pull-the-docker-image 部署好后,将下载好的模型放到 Model 文件夹下。

config.yml 如下所示:

'ABS_PATH': '/app'
'ADMIN_ROUTE': '/admin'
'API_KEY': '4192fa428db1d985d82cfe469b2f3f0215ec346b7c10f0095b27b09244037c8f'
'API_KEY_ENABLED': !!bool 'false'
'CACHE_PATH': '/app/cache'
'CLEAN_INTERVAL_SECONDS': !!int '3600'
'DEBUG': !!bool 'false'
'DEVICE': !torch.device 'cuda'
'DYNAMIC_LOADING': !!bool 'false'
'ESPEAK_LIBRARY': ''
'IS_ADMIN_ENABLED': !!bool 'true'
'JSON_AS_ASCII': !!bool 'false'
'LANGUAGE_AUTOMATIC_DETECT': []
'LANGUAGE_IDENTIFICATION_LIBRARY': 'langid'
'LOGGING_LEVEL': 'DEBUG'
'LOGS_BACKUPCOUNT': !!int '30'
'LOGS_PATH': '/app/logs'
'MAX_CONTENT_LENGTH': !!int '5242880'
'PORT': !!int '23456'
'SAVE_AUDIO': !!bool 'false'
'SECRET_KEY': 'c5d61c6c2b8b210662d4bd667aa30691fccd52ba2d71ef597262fb4b75db84f6'
'UPLOAD_FOLDER': '/app/upload'
'default_parameter':
  'format': 'wav'
  'id': !!int '0'
  'lang': 'AUTO'
  'length': !!int '1'
  'length_en': !!int '0'
  'length_ja': !!int '0'
  'length_zh': !!int '0'
  'noise': !!float '0.33'
  'noisew': !!float '0.4'
  'sdp_ratio': !!float '0.2'
  'segment_size': !!int '50'
'model_config':
  'dimensional_emotion_npy': '/app/Model/npy'
  'hubert_soft_model': '/app/Model/hubert-soft-0d54a1f4.pt'
  'model_list':
  - - '/app/Model/bert_vits2/Bert-VITS2-ZH-JP-EN_20231110/G_0.pth'
    - '/app/Model/bert_vits2/Bert-VITS2-ZH-JP-EN_20231110/config.json'
  - - '/app/Model/bert_vits2/Azuma/G_17400.pth'
    - '/app/Model/bert_vits2/Azuma/config.json'
  - - '/app/Model/bert_vits2/keqing/G_18000.pth'
    - '/app/Model/bert_vits2/keqing/config.json'
  - - '/app/Model/bert_vits2/LAPLACE/G_18000.pth'
    - '/app/Model/bert_vits2/LAPLACE/config.json'
  - - '/app/Model/bert_vits2/paimeng/G_24000.pth'
    - '/app/Model/bert_vits2/paimeng/config.json'
'users':
  'admin':
    'admin': !User
      'id': !!int '1'
      'password': 'Avlsc7FDnmDXmPSE'
      'username': 'RXgrrT0A'

docker compose up 拉起

Artrajz commented 11 months ago

请问这个模型是从Bert-VITS2官方下载的底模吗?底模不适合直接用来推理,因为这个底模由于裁剪了说话人,所以会报加载不了说话人权重的报错。

JKCU2014 commented 11 months ago

请问这个模型是从Bert-VITS2官方下载的底模吗?底模不适合直接用来推理,因为这个底模由于裁剪了说话人,所以会报加载不了说话人权重的报错。

@Artrajz 对,是从Bert-VITS2官方下载的。 那这个底模是用来干嘛的?有办法加载说话人么?

Artrajz commented 11 months ago

底模是训练时节省训练时间使用的。这个被裁剪了没有办法,只能自己找些合适的数据按正常方法进行训练使用。

JKCU2014 commented 11 months ago

底模是训练时节省训练时间使用的。这个被裁剪了没有办法,只能自己找些合适的数据按正常方法进行训练使用。

了解了,感谢~