PlayVoice / vits_chinese

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
https://huggingface.co/spaces/maxmax20160403/vits_chinese
MIT License
1.16k stars 167 forks source link

RuntimeError: The expanded size of the tensor (50) must match the existing size (0) at non-singleton dimension 1. Target sizes: [192, 50]. Tensor sizes: [192, 0] #78

Open m1258218761 opened 1 year ago

m1258218761 commented 1 year ago

Traceback (most recent call last): File "/root/miniconda3/envs/vits/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/data1/Minxin/TTS/vits_chinese/train.py", line 161, in run train_and_evaluate( File "/data1/Minxin/TTS/vits_chinese/train.py", line 219, in train_and_evaluate (z, z_p, z_r, m_p, logs_p, m_q, logs_q) = net_g(x, x_lengths, bert, spec, spec_lengths) File "/root/miniconda3/envs/vits/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/vits/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/root/miniconda3/envs/vits/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/root/miniconda3/envs/vits/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/data1/Minxin/TTS/vits_chinese/models.py", line 543, in forward z_slice, ids_slice = commons.rand_slice_segments( File "/data1/Minxin/TTS/vits_chinese/commons.py", line 65, in rand_slice_segments ret = slice_segments(x, ids_str, segment_size) File "/data1/Minxin/TTS/vits_chinese/commons.py", line 55, in slice_segments ret[i] = x[i, :, idx_str:idx_end] RuntimeError: The expanded size of the tensor (50) must match the existing size (0) at non-singleton dimension 1. Target sizes: [192, 50]. Tensor sizes: [192, 0]

请问大佬,我这个是什么问题呀,自己做的数据,音频也按16k采样了,正常跑了几十个batch后就报这个错了

MaxMax2016 commented 1 year ago

有点像某个音频有问题

panxin801 commented 1 year ago

我也遇到了这个问题请问您解决了吗

1006079161 commented 1 year ago

我也遇到了这个问题请问您解决了吗

haiewu commented 1 year ago

音频太短了,需要大于2秒以上

xueyuanZ commented 1 year ago

您好,请问这个问题您是怎么解决的?