YYuX-1145 / Bert-VITS2-Integration-package

vits2 backbone with bert
https://www.bilibili.com/video/BV13p4y1d7v9
GNU Affero General Public License v3.0
333 stars 30 forks source link

worker 8 ,batch1 #66

Closed bihailantian655 closed 7 months ago

bihailantian655 commented 8 months ago

跟 worker 不知道有关系么 , batch 设置为1 就会报这个错误 ,batch 高 ,占用现存太大

Traceback (most recent call last): File "train_ms.py", line 840, in run() File "train_ms.py", line 361, in run train_and_evaluate( File "train_ms.py", line 560, in train_and_evaluate loss_slm = wl.discriminator( File "I:\Bert-VITS2-2.3\losses.py", line 108, in discriminator wav_embeddings = self.wavlm( File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 1231, in forward extract_features = self.feature_extractor(input_values) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 369, in forward hidden_states = conv_layer(hidden_states) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\transformers\models\wavlm\modeling_wavlm.py", line 264, in forward hidden_states = self.conv(hidden_states) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) File "I:\Bert-VITS2-2.3\venv\lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [512, 1, 10], expected input[1, 5945, 1] to have 1 channels, but got 5945 channels instead

YYuX-1145 commented 8 months ago

看上去是一个原项目的bug,但是建议bs至少为2,否则效果很有可能会很糟糕

bihailantian655 commented 8 months ago

模型能融合么,多个 只有一个或几个说话人的 模型融合生成 有多个说话人的单个模型,而不用再重复训练。

YYuX-1145 commented 8 months ago

不能