Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 703 forks source link

Warning: no short audios found #526

Closed DogeLord081 closed 9 months ago

DogeLord081 commented 9 months ago

My folder structure is:

custom_character_voice ├───Megumin_1 ├ ├───Megumin_1.wav ├ ├───Megumin_2.wav ├ ├───... ├ └───Megumin_1324.wav

But on step 3 of the colab, I keep getting this error:

error: XDG_RUNTIME_DIR not set in the environment.
ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
ALSA lib confmisc.c:855:(parse_card) cannot find card '0'
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory
ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory
ALSA lib confmisc.c:1334:(snd_func_refer) error evaluating name
ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM default
100%|██████████████████████████████████████| 2.88G/2.88G [00:13<00:00, 237MiB/s]
Warning: no long audios & videos found, this IS expected if you have only uploaded short audios
this IS NOT expected if you have uploaded any long audios, videos or video links. Please check your file structure or make sure your audio/video language is supported.
Warning: no short audios found, this IS expected if you have only uploaded long audios, videos or video links.
this IS NOT expected if you have uploaded a zip file of short audios. Please check your file structure or make sure your audio language is supported.
Lu233 commented 9 months ago

I got exact same issue with Colab. any idea now how the issue could be fixed?

DogeLord081 commented 9 months ago

I got exact same issue with Colab. any idea now how the issue could be fixed?

Yep, I found the solution, change line 19 in scripts/short_audio_transcribe.py to:

mel = whisper.log_mel_spectrogram(audio, n_mels=128).to(model.device)

hduwym commented 9 months ago

I also had the same problem, but after modifying it according to the answer you gave, it was not optimized. It seems to prompt me about a CUDA problem, but I am connected to T4, so I still have a lot of doubts. Can you help me figure out how to solve this problem? erro