SayaSS / vits-finetuning

Fine-Tuning your VITS model using a pre-trained model
MIT License
551 stars 86 forks source link

Colab训练出错 #15

Closed Watee22 closed 1 year ago

Watee22 commented 1 year ago

在自行调参的过程中,我发现colab训练单元格执行后出现了这样的错误:

`INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
  0% 0/1 [00:00<?, ?it/s]0
  0% 0/1 [00:00<?, ?it/s]
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff5e07bd700>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1466, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 1397, in _shutdown_workers
    if not self._shutdown:
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown'
Traceback (most recent call last):
  File "/content/vits-finetuning/train_ms.py", line 306, in <module>
    main()
  File "/content/vits-finetuning/train_ms.py", line 56, in main
    mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/vits-finetuning/train_ms.py", line 124, in run
    train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
  File "/content/vits-finetuning/train_ms.py", line 144, in train_and_evaluate
    for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths, speakers) in enumerate(tqdm(train_loader)):
  File "/usr/local/lib/python3.9/dist-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 435, in __iter__
    return self._get_iterator()
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 381, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 988, in __init__
    super(_MultiProcessingDataLoaderIter, self).__init__(loader)
  File "/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py", line 598, in __init__
    self._sampler_iter = iter(self._index_sampler)
  File "/content/vits-finetuning/data_utils.py", line 360, in __iter__
    ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero`

我使用了自己的中文模型,训练集数量为50条,测试集数量为18条,请问这是什么原因呢?

SayaSS commented 1 year ago

在wav文件都为22050hz,16bit,单声道的情况下 是因为你的数据集音频wav文件全都大于150KB(大约3s),导致划分成batch的时候有bucket为空,我做了一个较为简易的处理,你可以使用原数据集继续训练了。 顺带一提,根据现有的设置,大于500KB(大约10s)的wav文件都不会进行读取,有大于这个长度的最好切割一下