Closed godspirit00 closed 2 years ago
Sorry I have no clear answer for this errors. Let me check some points:
- Does it work the existing recipe?
Should I fine-tune the vctk vits model with the vctk dataset or is there an existing recipe for fine-tuning a multi speaker vits model with a multi speaker dataset?
(Sorry I pressed the wrong button...) I have tried fine-tuning the LJ fastspeech 2 model with the data of one of the speakers from my dataset, and it works. Maybe there's something wrong with my multi speaker dataset? But how can I find out what's wrong? Thank you!
Should I fine-tune the vctk vits model with the vctk dataset or is there an existing recipe for fine-tuning a multi speaker vits model with a multi speaker dataset?
I think this is not related to finetuning. Maybe the error will happen without --init_param
.
I have tried fine-tuning the LJ fastspeech 2 model with the data of one of the speakers from my dataset, and it works.
OK. Have you ever tried VITS?
Maybe there's something wrong with my multi speaker dataset? But how can I find out what's wrong?
Does your dataset include only mono audio? If it contains mixed of mono and stereo, that will be a problem.
Does your dataset include only mono audio? If it contains mixed of mono and stereo, that will be a problem.
I checked all the data and they are all mono.
Have you ever tried VITS?
Not yet. Then I will try fine-tuning the LJ vits model with also one of the speakers' data and see if it has something to do with vits.
@kan-bayashi I've started to finetune LJ vits model (kan-bayashi/ljspeech_tts_train_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave
) with the data of one of the speakers in my dataset. It has run several epochs now, and no error (except OOM, which I fixed by lowering batch_bins
).
So I guess the error I met before comes from the multi-speaker related part.
Thank you for your kind report. Then, let us check each point.
run.sh --stage 6 --stop-stage 6 --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/tuning/finetune_vits.yaml --use-sid true --tag debug_1
run.sh --stage 6 --stop-stage 6 --tts_task gan_tts --feats_extract linear_spectrogram --feats_normalize none --train_config ./conf/tuning/<single_speaker_vits>.yaml --tag debug_2
Running on CPU might provide a more informative message (I believe this is a bug on espnet side).
- Run w/o --init_param
This produces the error too.
2. Run w/o SID
This can start without error.
So I guess the problem comes from the multi-speaker part.
Running on CPU might provide a more informative message
I am running on a cloud GPU, so I ran it in no-GPU mode, and it said RuntimeError: No CUDA GPUs are available
.
—ngpu 0 does not work in GPU env?
—ngpu 0 does not work in GPU env?
With --ngpu 0
, the error message is:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/autodl-tmp/espnet/espnet2/bin/gan_tts_train.py", line 22, in <module>
main()
File "/root/autodl-tmp/espnet/espnet2/bin/gan_tts_train.py", line 18, in main
GANTTSTask.main(cmd=cmd)
File "/root/autodl-tmp/espnet/espnet2/tasks/abs_task.py", line 1019, in main
cls.main_worker(args)
File "/root/autodl-tmp/espnet/espnet2/tasks/abs_task.py", line 1315, in main_worker
cls.trainer.run(
File "/root/autodl-tmp/espnet/espnet2/train/trainer.py", line 286, in run
all_steps_are_invalid = cls.train_one_epoch(
File "/root/autodl-tmp/espnet/espnet2/train/gan_trainer.py", line 160, in train_one_epoch
retval = model(forward_generator=turn == "generator", **batch)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/espnet/espnet2/gan_tts/espnet_model.py", line 162, in forward
return self.tts(**batch)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/espnet/espnet2/gan_tts/vits/vits.py", line 315, in forward
return self._forward_discrminator(
File "/root/autodl-tmp/espnet/espnet2/gan_tts/vits/vits.py", line 478, in _forward_discrminator
outs = self.generator(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/autodl-tmp/espnet/espnet2/gan_tts/vits/generator.py", line 321, in forward
g = self.global_emb(sids.view(-1)).unsqueeze(-1)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
This is the true error.
You set spks: 5
in yaml but maybe the sid is not within 1-5.
Please check dump/raw/org/tr_no_dev/spk2sid
(The path may be different).
That solves the problem! I set spks: 6
and the training can start now.
Thank you so much!
Hi, I was trying to finetune the vctk vits model (
kan-bayashi/vctk_tts_train_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave
) with my own dataset, but I met the error:I ran the training with
My config is
What am I missing? Thank you!