首次训练时报错 RuntimeError: Given groups=1, weight of size [512, 1, 10], expected input[1, 5945, 1] to have 1 channels, but got 5945 channels instead

Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 INFO:models:Loaded checkpoint 'Data\abc\models\DUR_0.pth' (iteration 0) ERROR:models:emb_g.weight is not in the checkpoint INFO:models:Loaded checkpoint 'Data\abc\models\G_0.pth' (iteration 0) INFO:models:Loaded checkpoint 'Data\abc\models\D_0.pth' (iteration 0) **检测到模型存在，epoch为 1，gloabl step为 0***** INFO:models:Loaded checkpoint 'Data\abc\models\WD_0.pth' (iteration 0) Some weights of the model checkpoint at ./slm/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']

This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of WavLMModel were not initialized from the model checkpoint at ./slm/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 0%| | 0/7 [00:06<?, ?it/s] Traceback (most recent call last): File "train_ms.py", line 840, in run() File "train_ms.py", line 361, in run train_and_evaluate( File "train_ms.py", line 560, in train_and_evaluate loss_slm = wl.discriminator( File "E:\ai\Bert-VITS2-2.3\losses.py", line 108, in discriminator wav_embeddings = self.wavlm( File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 1231, in forward extract_features = self.feature_extractor(input_values) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 369, in forward hidden_states = conv_layer(hidden_states) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 264, in forward hidden_states = self.conv(hidden_states) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\ai\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [512, 1, 10], expected input[1, 5945, 1] to have 1 channels, but got 5945 channels instead

YYuX-1145 / Bert-VITS2-Integration-package

首次训练时报错 RuntimeError: Given groups=1, weight of size [512, 1, 10], expected input[1, 5945, 1] to have 1 channels, but got 5945 channels instead #69