Open wisev3 opened 1 year ago
Thanks for the report. Can you tell me whether this error happens with a single GPU?
About TTS fine-tuning, you can refer to https://github.com/espnet/espnet/tree/master/egs2/qasr_tts/tts1
Thank you for your reply. I confirmed that the error occurred without using multi GPUs.
OK, thanks @kan-bayashi, do you have any comments?
@kan-bayashi , I would be grateful for any comment on my issue.
Sorry for the late reply.
Maybe the dataset includes too short audio.
In random windowed discriminator, we extract segment from the entire sequence using the following segment_size
https://github.com/espnet/espnet/blob/fc37f80ff96070107c61a2a020f9627f82d646c5/egs2/kss/tts1/conf/tuning/train_jets.yaml#L77
Therefore, we assume that the audio length > shift size * segment size.
You can remove such audios with --min_wav_duration 0.75
for run.sh
.
I sincerely appreciate your answer. Training successfully started, and it seems to work. How long do I have to train the model for the KSS example? Do you have any suggestions?
Describe the bug I followed the official instructions to install ESPnet2 and attempted to run the 'egs2/kss/tts1' recipe using the provided KSS dataset as an example. However, I encountered an error due to a size mismatch between the input and target tensors. Please refer to the error log file at the bottom of this report. I would appreciate any assistance in resolving this issue.
Additionally, I am looking for a beginner-friendly tutorial on fine-tuning TTS tasks using ESPnet2. Do you have any recommendations?
Basic environments:
Environments from
torch.utils.collect_env
:Task information:
To Reproduce Steps to reproduce the behavior:
cd egs2/kss/tts1
Error logs train.2.log