`INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
0% 0/1 [00:00<?, ?it/s]0
0% 0/1 [00:00<?, ?it/s]
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff5e07bd700>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1466, in __del__
self._shutdown_workers()
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1397, in _shutdown_workers
if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Traceback (most recent call last):
File "/content/vits-finetuning/train_ms.py", line 306, in <module>
main()
File "/content/vits-finetuning/train_ms.py", line 56, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/content/vits-finetuning/train_ms.py", line 124, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/vits-finetuning/train_ms.py", line 144, in train_and_evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(tqdm(train_loader)):
File "/usr/local/lib/python3.9/dist-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 435, in __iter__
return self._get_iterator()
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 988, in __init__
super(_MultiProcessingDataLoaderIter, self).__init__(loader)
File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 598, in __init__
self._sampler_iter = iter(self._index_sampler)
File "/content/vits-finetuning/data_utils.py", line 360, in __iter__
ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero`
在自行调参的过程中,我发现colab训练单元格执行后出现了这样的错误:
我使用了自己的中文模型,训练集数量为50条,测试集数量为18条,请问这是什么原因呢?