INFO:root:Loading yaml from /home/xxx/vits-simple-api/config.yml
Building prefix dict from the default dictionary ...
DEBUG:jieba:Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
DEBUG:jieba:Loading model from cache /tmp/jieba.cache
Loading model cost 0.915 seconds.
DEBUG:jieba:Loading model cost 0.915 seconds.
Prefix dict has been built successfully.
DEBUG:jieba:Prefix dict has been built successfully.
2024-01-03 13:34:13 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v2-large-japanese-char-wwm
2024-01-03 13:34:19 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v2-large-japanese-char-wwm
2024-01-03 13:34:19 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large
/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:473: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
warnings.warn(
Some weights of DebertaV2ForMaskedLM were not initialized from the model checkpoint at /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large and are newly initialized: ['cls.predictions.decoder.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2024-01-03 13:34:28 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/deberta-v3-large
2024-01-03 13:34:28 [INFO] [model_handler.load_bert:125] Loading BERT model: /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large
Some weights of the model checkpoint at /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2024-01-03 13:34:36 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large
/home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
2024-01-03 13:34:42 [INFO] [utils.load_checkpoint:65] Loaded checkpoint '/home/xxx/vits-simple-api/Model/xxx/G_24750.pth' (iteration 188)
2024-01-03 13:34:42 [INFO] [ModelManager._load_model_from_path:229] model_type:BERT-VITS2 model_id:0 n_speakers:1 model_path:/home/xxx/vits-simple-api/Model/xxx/G_24750.pth
运行环境
问题描述
模型用Bert-vits2-V2.3训练的中英双语模型,自带gradio中英混合文本生成正常。
使用vits-simple-api加载时会出现一些warning:
This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 2024-01-03 13:34:36 [INFO] [model_handler.load_bert:130] Success loading: /home/xxx/vits-simple-api/bert_vits2/bert/chinese-roberta-wwm-ext-large /home/xxx/anaconda3/envs/vits-simple-api/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") 2024-01-03 13:34:42 [INFO] [utils.load_checkpoint:65] Loaded checkpoint '/home/xxx/vits-simple-api/Model/xxx/G_24750.pth' (iteration 188) 2024-01-03 13:34:42 [INFO] [ModelManager._load_model_from_path:229] model_type:BERT-VITS2 model_id:0 n_speakers:1 model_path:/home/xxx/vits-simple-api/Model/xxx/G_24750.pth
配置文件config.json内的 版本为"version": "2.3"
中英双语训练的模型被识别成中日英三语模型,推理时中文正常,读到任何英文字母时webui都会报错“无法获取音频数据”,没有训练过的日语文本生成正常而且效果很不错。
英文报错日志:
已参考其他issues#111 #113,没能解决。我怀疑是2.3兼容问题,在考虑退到2.1或者更可靠的BertVITS2版本重新训练。