Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 703 forks source link

Output question mark(?) during training #534

Open ridethepig opened 9 months ago

ridethepig commented 9 months ago

Though on my setting the code can run inference and fine-tuning without fatal error, and the generated results seem correct, I notice that there are ?s printed during training. Is this expected or not?

I located this in the source code common.py

def slice_segments(x, ids_str, segment_size=4):
  ret = torch.zeros_like(x[:, :, :segment_size])
  for i in range(x.size(0)):
    idx_str = ids_str[i]
    idx_end = idx_str + segment_size
    try:
      ret[i] = x[i, :, idx_str:idx_end]
    except RuntimeError:
      print("?")
  return ret

I try to print out the exception, and it looks like

The expanded size of the tensor (32) must match the existing size (0) at non-singleton dimension 1.  Target sizes: [192, 32].  Tensor sizes: [192, 0]
?
The expanded size of the tensor (32) must match the existing size (0) at non-singleton dimension 1.  Target sizes: [80, 32].  Tensor sizes: [80, 0]
?
The expanded size of the tensor (8192) must match the existing size (0) at non-singleton dimension 1.  Target sizes: [1, 8192].  Tensor sizes: [0]
?

Thx for your great work on this project, and hope for your reply.

By the way, my environment is listed as below:

  • GPU: RTX4090
  • OS: Linuxmint(almost same as Ubuntu 22.04)
  • CUDA 12.2 with cuDNN 8.9.4
  • python==3.8.0
  • torch==2.1.1+cu118, torchaudio==2.1.1+cu118