我在您的项目上使用您提供的预训练模型训练正常。之后全新的一次训练采用了标贝的预训练模型,并根据他人使用标贝的预训练模型项目修改了cleaner和config,重新进行了process,在第一轮epoch进行到一半的时候报了如标题的错。请您帮忙有空的时候看一眼,这个报错属于是预训练模型和您项目不兼容的原因,还是config或者cleaner我自己设置的问题。详细报错内容如下:
INFO:checkpoints:Saving model and optimizer state at iteration 1 to ../vits-finetuning\checkpoints\G_0.pth
INFO:checkpoints:Saving model and optimizer state at iteration 1 to ../vits-finetuning\checkpoints\D_0.pth
7%|█████▌ | 1/15 [00:22<05:18, 22.73s/it]INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
47%|██████████████████████████████████████▋ | 7/15 [00:33<00:38, 4.84s/it]
Traceback (most recent call last):
File "D:\GitHub\vits-finetuning\train_ms.py", line 308, in
main()
File "D:\GitHub\vits-finetuning\train_ms.py", line 58, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, args)
File "D:\GitHub\vits-finetuning\train_ms.py", line 126, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "D:\GitHub\vits-finetuning\train_ms.py", line 203, in train_and_evaluate
scaler.step(optim_g)
File "d:\anaconda3\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 341, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs)
File "d:\anaconda3\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 288, in _maybe_opt_step
retval = optimizer.step(*args, *kwargs)
File "d:\anaconda3\lib\site-packages\torch\optim\lr_scheduler.py", line 68, in wrapper
return wrapped(args, kwargs)
File "d:\anaconda3\lib\site-packages\torch\optim\optimizer.py", line 140, in wrapper
out = func(*args, *kwargs)
File "d:\anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(args, **kwargs)
File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 162, in step
adamw(params_with_grad,
File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 219, in adamw
func(params,
File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 273, in _single_tensor_adamw
expavg.mul(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (78) must match the size of tensor b (131) at non-singleton dimension 0
我在您的项目上使用您提供的预训练模型训练正常。之后全新的一次训练采用了标贝的预训练模型,并根据他人使用标贝的预训练模型项目修改了cleaner和config,重新进行了process,在第一轮epoch进行到一半的时候报了如标题的错。请您帮忙有空的时候看一眼,这个报错属于是预训练模型和您项目不兼容的原因,还是config或者cleaner我自己设置的问题。详细报错内容如下: INFO:checkpoints:Saving model and optimizer state at iteration 1 to ../vits-finetuning\checkpoints\G_0.pth INFO:checkpoints:Saving model and optimizer state at iteration 1 to ../vits-finetuning\checkpoints\D_0.pth 7%|█████▌ | 1/15 [00:22<05:18, 22.73s/it]INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 47%|██████████████████████████████████████▋ | 7/15 [00:33<00:38, 4.84s/it] Traceback (most recent call last): File "D:\GitHub\vits-finetuning\train_ms.py", line 308, in
main()
File "D:\GitHub\vits-finetuning\train_ms.py", line 58, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error: Traceback (most recent call last): File "d:\anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, args) File "D:\GitHub\vits-finetuning\train_ms.py", line 126, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "D:\GitHub\vits-finetuning\train_ms.py", line 203, in train_and_evaluate scaler.step(optim_g) File "d:\anaconda3\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 341, in step retval = self._maybe_opt_step(optimizer, optimizer_state, args, kwargs) File "d:\anaconda3\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 288, in _maybe_opt_step retval = optimizer.step(*args, *kwargs) File "d:\anaconda3\lib\site-packages\torch\optim\lr_scheduler.py", line 68, in wrapper return wrapped(args, kwargs) File "d:\anaconda3\lib\site-packages\torch\optim\optimizer.py", line 140, in wrapper out = func(*args, *kwargs) File "d:\anaconda3\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context return func(args, **kwargs) File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 162, in step adamw(params_with_grad, File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 219, in adamw func(params, File "d:\anaconda3\lib\site-packages\torch\optim\adamw.py", line 273, in _single_tensor_adamw expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: The size of tensor a (78) must match the size of tensor b (131) at non-singleton dimension 0