YYuX-1145 / Bert-VITS2-Integration-package

vits2 backbone with bert
https://www.bilibili.com/video/BV13p4y1d7v9
GNU Affero General Public License v3.0
332 stars 30 forks source link

首次训练没问题,继续训练报错 #12

Open langyaya opened 1 year ago

langyaya commented 1 year ago

RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

langyaya commented 1 year ago

-- Process 0 terminated with the following error: Traceback (most recent call last): File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, args) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\train_ms.py", line 193, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\train_ms.py", line 231, in train_and_evaluate (z, z_p, m_p, logs_p, m_q, logs_q), (hiddenx, logw, logw) = net_g(x, x_lengths, spec, spec_lengths, speakers, tone, language, bert) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1000, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\models.py", line 667, in forward l_length_sdp = self.sdp(x, x_mask, w, g=g) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\models.py", line 177, in forward z_q, logdet_q = flow(z_q, x_mask, g=(x + h_w)) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\modules.py", line 374, in forward x1, logabsdet = piecewise_rational_quadratic_transform(x1, File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 33, in piecewise_rational_quadratic_transform outputs, logabsdet = spline_fn( File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 82, in unconstrained_rational_quadratic_spline outputs[inside_interval_mask], logabsdet[inside_interval_mask] = rational_quadratic_spline( File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 105, in rational_quadratic_spline if torch.min(inputs) < left or torch.max(inputs) > right: RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

YYuX-1145 commented 1 year ago

-- Process 0 terminated with the following error: Traceback (most recent call last): File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, args) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\train_ms.py", line 193, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\train_ms.py", line 231, in train_and_evaluate (z, z_p, m_p, logs_p, m_q, logs_q), (hiddenx, logw, logw) = net_g(x, x_lengths, spec, spec_lengths, speakers, tone, language, bert) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 1000, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\models.py", line 667, in forward l_length_sdp = self.sdp(x, x_mask, w, g=g) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\models.py", line 177, in forward z_q, logdet_q = flow(z_q, x_mask, g=(x + h_w)) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\modules.py", line 374, in forward x1, logabsdet = piecewise_rational_quadratic_transform(x1, File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 33, in piecewise_rational_quadratic_transform outputs, logabsdet = spline_fn( File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 82, in unconstrained_rational_quadratic_spline outputs[inside_interval_mask], logabsdet[inside_interval_mask] = rational_quadratic_spline( File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\transforms.py", line 105, in rational_quadratic_spline if torch.min(inputs) < left or torch.max(inputs) > right: RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

可以发一下上面的加载信息吗?

langyaya commented 1 year ago

========================= INFO:OUTPUT_MODEL:{'train': {'log_interval': 10, 'eval_interval': 30, 'seed': 52, 'epochs': 2000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 5, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'zxcbert': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs\./OUTPUT_MODEL', 'cont': True} WARNING:OUTPUT_MODEL:D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 198 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs./OUTPUT_MODEL\DUR_600.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\DUR_600.pth' (iteration 15) ./logs./OUTPUT_MODEL\G_600.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\G_600.pth' (iteration 15) ./logs./OUTPUT_MODEL\D_600.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\D_600.pth' (iteration 15) 0it [00:00, ?it/s]D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\mel_processing.py:78: FutureWarning: Pass sr=44100, n_fft=2048, n_mels=128, fmin=0.0, fmax=None as keyword args. From version 0.10 passing these as positional arguments will result in an error mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:1305] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator ()) D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\autograd__init__.py:197: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [1, 9, 96], strides() = [25632, 96, 1] bucket_view.sizes() = [1, 9, 96], strides() = [864, 96, 1] (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:339.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 1it [00:06, 6.75s/it] Traceback (most recent call last): File "train_ms.py", line 402, in main() File "train_ms.py", line 60, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "D:\BaiduNetdiskDownload\Bert-VITS2-Integration-Package\venv\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap 这是上面的信息

YYuX-1145 commented 1 year ago

很奇怪的问题,但是我没有遇到过。搜索了一下这个错误,也是有人使用vits(1)过程中产生的,他通过降级torch解决了。但按理来说首次训练没报这个错 继续训练也不应该啊

langyaya commented 1 year ago

那我降级试试能否解决

Misaka-Mikoto-Tech commented 8 months ago

遇到同样的问题,报错信息如下:

加载config中的配置0
加载config中的配置localhost
加载config中的配置10086
加载config中的配置0
加载config中的配置1
加载环境变量
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
12-31 07:04:37 INFO     | data_utils.py:63 | Init dataset...
100%|███████████████████████████████████████████████████████████████████████████| 1401/1401 [00:00<00:00, 17513.46it/s]
12-31 07:04:37 INFO     | data_utils.py:78 | skipped: 0, total: 1401
12-31 07:04:37 INFO     | data_utils.py:63 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<?, ?it/s]
12-31 07:04:37 INFO     | data_utils.py:78 | skipped: 0, total: 7
Using noise scaled MAS for VITS2
Using duration discriminator for VITS2
INFO:models:Loaded checkpoint 'Data\ichika\models\DUR_600.pth' (iteration 3)
INFO:models:Loaded checkpoint 'Data\ichika\models\G_700.pth' (iteration 3)
PytorchStreamReader failed reading zip archive: failed finding central directory
INFO:models:Loaded checkpoint 'Data\ichika\models\WD_600.pth' (iteration 3)
Traceback (most recent call last):
  File "train_ms.py", line 840, in <module>
    run()
  File "train_ms.py", line 338, in run
    scheduler_d = torch.optim.lr_scheduler.ExponentialLR(
  File "G:\AI\voice\train\Bert-VITS2-2.3一键包\venv\lib\site-packages\torch\optim\lr_scheduler.py", line 586, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "G:\AI\voice\train\Bert-VITS2-2.3一键包\venv\lib\site-packages\torch\optim\lr_scheduler.py", line 42, in __init__
    raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
Misaka-Mikoto-Tech commented 8 months ago

遇到同样的问题,报错信息如下:

加载config中的配置0
加载config中的配置localhost
加载config中的配置10086
加载config中的配置0
加载config中的配置1
加载环境变量
MASTER_ADDR: localhost,
MASTER_PORT: 10086,
WORLD_SIZE: 1,
RANK: 0,
LOCAL_RANK: 0
12-31 07:04:37 INFO     | data_utils.py:63 | Init dataset...
100%|███████████████████████████████████████████████████████████████████████████| 1401/1401 [00:00<00:00, 17513.46it/s]
12-31 07:04:37 INFO     | data_utils.py:78 | skipped: 0, total: 1401
12-31 07:04:37 INFO     | data_utils.py:63 | Init dataset...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<?, ?it/s]
12-31 07:04:37 INFO     | data_utils.py:78 | skipped: 0, total: 7
Using noise scaled MAS for VITS2
Using duration discriminator for VITS2
INFO:models:Loaded checkpoint 'Data\ichika\models\DUR_600.pth' (iteration 3)
INFO:models:Loaded checkpoint 'Data\ichika\models\G_700.pth' (iteration 3)
PytorchStreamReader failed reading zip archive: failed finding central directory
INFO:models:Loaded checkpoint 'Data\ichika\models\WD_600.pth' (iteration 3)
Traceback (most recent call last):
  File "train_ms.py", line 840, in <module>
    run()
  File "train_ms.py", line 338, in run
    scheduler_d = torch.optim.lr_scheduler.ExponentialLR(
  File "G:\AI\voice\train\Bert-VITS2-2.3一键包\venv\lib\site-packages\torch\optim\lr_scheduler.py", line 586, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "G:\AI\voice\train\Bert-VITS2-2.3一键包\venv\lib\site-packages\torch\optim\lr_scheduler.py", line 42, in __init__
    raise KeyError("param 'initial_lr' is not specified "
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

问题已解决,是因为首次训练后直接把窗口关闭导致数据文件被损坏,把最新的文件删掉就好了