YYuX-1145 / Bert-VITS2-Integration-package

vits2 backbone with bert
https://www.bilibili.com/video/BV13p4y1d7v9
GNU Affero General Public License v3.0
335 stars 29 forks source link

训练时报错 #7

Closed makeukus closed 1 year ago

makeukus commented 1 year ago

INFO:OUTPUT_MODEL:{'train': {'log_interval': 10, 'eval_interval': 1000, 'seed': 52, 'epochs': 1000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'MZ_GIRL': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs\./OUTPUT_MODEL', 'cont': False} WARNING:OUTPUT_MODEL:D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 757 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs./OUTPUT_MODEL\DUR_0.pth error, norm_1.gamma is not in the checkpoint error, norm_1.beta is not in the checkpoint error, norm_2.gamma is not in the checkpoint error, norm_2.beta is not in the checkpoint error, cond.weight is not in the checkpoint error, cond.bias is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\DUR_0.pth' (iteration 694) ./logs./OUTPUT_MODEL\G_0.pth error, emb_g.weight is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\G_0.pth' (iteration 0) ./logs./OUTPUT_MODEL\D_0.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\D_0.pth' (iteration 0) 0it [00:03, ?it/s] Traceback (most recent call last): File "train_ms.py", line 402, in main() File "train_ms.py", line 60, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, args) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\train_ms.py", line 193, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\train_ms.py", line 231, in train_and_evaluate (z, z_p, m_p, logs_p, m_q, logs_q), (hiddenx, logw, logw) = net_g(x, x_lengths, spec, spec_lengths, speakers, tone, language, bert) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1000, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\models.py", line 680, in forward z_slice, ids_slice = commons.rand_slice_segments(z, y_lengths, self.segment_size) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\commons.py", line 63, in rand_slice_segments ret = slice_segments(x, ids_str, segment_size) File "D:\BaiduNetdiskDownload\0903\Bert-VITS2-Integration-Package\commons.py", line 53, in slice_segments ret[i] = x[i, :, idx_str:idx_end] RuntimeError: The expanded size of the tensor (32) must match the existing size (0) at non-singleton dimension 1. Target sizes: [192, 32]. Tensor sizes: [192, 0]

YYuX-1145 commented 1 year ago

删除模型输出目录里的全部文件再试试?

makeukus commented 1 year ago

/logs./OUTPUT_MODE

/logs./OUTPUT_MODE 这里吗 试了还是不行

makeukus commented 1 year ago

Snipaste_2023-09-04_13-56-47 AI是这样回复的.

makeukus commented 1 year ago

def slice_segments(x, ids_str, segment_size=4): ret = torch.zeros_like(x[:, :, :segment_size]) for i in range(x.size(0)): idx_str = ids_str[i] idx_end = idx_str + segment_size try: ret[i] = x[i, :, idx_str:idx_end] except RuntimeError: print("?") return ret

改成这样,是可以跑了 感觉问题会被忽略,也不清楚最终会出现撒问题不 image

YYuX-1145 commented 1 year ago

def slice_segments(x, ids_str, segment_size=4): ret = torch.zeros_like(x[:, :, :segment_size]) for i in range(x.size(0)): idx_str = ids_str[i] idx_end = idx_str + segment_size try: ret[i] = x[i, :, idx_str:idx_end] except RuntimeError: print("?") return ret

改成这样,是可以跑了 感觉问题会被忽略,也不清楚最终会出现撒问题不 image

commons.py我没改过。产生问题的原因我不清楚。另外我猜你的模型有可能推理出来是静音

makeukus commented 1 year ago

啊哈哈哈 那我赶紧试试 现在已经2000pth

makeukus commented 1 year ago

def slice_segments(x, ids_str, segment_size=4): ret = torch.zeros_like(x[:, :, :segment_size]) for i in range(x.size(0)): idx_str = ids_str[i] idx_end = idx_str + segment_size try: ret[i] = x[i, :, idx_str:idx_end] except RuntimeError: print("?") return ret 改成这样,是可以跑了 感觉问题会被忽略,也不清楚最终会出现撒问题不 image

commons.py我没改过。产生问题的原因我不清楚。另外我猜你的模型有可能推理出来是静音

不止是静音,直接ERROR......

makeukus commented 1 year ago

@YYuX-1145 有没有可能是数据集的问题? 比如音频太长了,超过20秒的那种,或空白太多了?

YYuX-1145 commented 1 year ago

@YYuX-1145 有没有可能是数据集的问题? 比如音频太长了,超过20秒的那种,或空白太多了?

空白,我不知道,但确实有这种可能性。过长的话,至少原神语音里30s这种没报错。但是过长很容易导致爆显存

makeukus commented 1 year ago

没问题了 大佬 是我的问题

YYuX-1145 commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

makeukus commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

因为我多个项目混用,完全用这个项目后,就没上述问题了。

Chopin68 commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

是slice_segments的问题,idx_str为负数导致张量无效了,加个边界检查解决了

makeukus commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

是slice_segments的问题,idx_str为负数导致张量无效了,加个边界检查解决了

感谢!

youxingtian commented 1 year ago

是slice_segments的问题,idx_str为负数导致张量无效了,加个边界检查解决了

您好,请问一下如何加边界检查呢,麻烦您给指导一下,

try:
      ret[i] = x[i, :, idx_str:idx_end]
except RuntimeError:
      print("?")

是像这样直接加一个过滤的代码吗

ShanChenqqM commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

因为我多个项目混用,完全用这个项目后,就没上述问题了。

您好问一下什么是指多项目混用,如何只用这一个项目呢?麻烦指点一下

ShanChenqqM commented 1 year ago

没问题了 大佬 是我的问题

那么可以说一下是什么原因导致的吗?我正好也收集一下错误案例,感谢!

因为我多个项目混用,完全用这个项目后,就没上述问题了。

这个整合包我在两周前还能正常使用,但今天突然不行了

Wild-Piggggggg commented 11 months ago

是slice_segments的问题,idx_str为负数导致张量无效了,加个边界检查解决了

您好,请问一下如何加边界检查呢,麻烦您给指导一下,

try:
      ret[i] = x[i, :, idx_str:idx_end]
except RuntimeError:
      print("?")

是像这样直接加一个过滤的代码吗

只需要在commons.py里这样改slice_segments函数,边界检查通过改一行代码实现:

def slice_segments(x, ids_str, segment_size=4):
  ret = torch.zeros_like(x[:, :, :segment_size])
  for i in range(x.size(0)):
    idx_str = max(ids_str[i],0)  #  在这里改,idx_str为负数时,将其置为0,防止出现无效张量
    idx_end = idx_str + segment_size
    ret[i] = x[i, :, idx_str:idx_end]
  return ret

我改了这里后,跑了1500次,出来的效果已经还算可以了,可以试试~