Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 703 forks source link

求助,关于预处理短音频数据集时找不到文件的问题 #495

Open phelixzhen opened 10 months ago

phelixzhen commented 10 months ago

如题,在预处理短音频文件时。执行到python scripts/short_audio_transcribe.py --languages "{CJE}" --whisper_size large这一步时,出现了报错:

Warning: no short audios found, this IS expected if you have only uploaded long audios, videos or video links. this IS NOT expected if you have uploaded a zip file of short audios. Please check your file structure or make sure your audio language is supported.

文件结构形如:

custom_character_voice\Character_name_1: processed_0.wav processed_1.wav ... 除此之外什么都没有了,放进去zip文件有没有都会报错,求助

gymeee0715 commented 10 months ago

不确定是不是要把processed_0.wav,processed_1.wav放到raw_audio资料夹下面

shirubei commented 10 months ago

我记得zip文件没法自动解压。只需要把解压后的语音文件放在 custom_character_voice 的对应角色名的目录下就应该可以。比如,角色名是 Liming,则把语音文件放在 custom_character_voice\Liming 这个目录下。 另外,这一步程序处理完会在同一个目录下生成 processed_xx.wav,如果你的原始文件就是这个名称,不知道会不会有问题,如果有问题就把原文件改名。

luuumity commented 9 months ago

我在short_audio_transcribe.py里的except修改成

except Exception as e:
        print("报错:", str(e))
        continue

之后得到了报错的具体内容,我遇见的是: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead

然后在short_audio_transcribe.py代码里修改了一下,把 n_mels=128 改成了 n_mels=80 : mel = whisper.log_mel_spectrogram(audio,n_mels=80).to(model.device) 然后就正常标注了。

ihmily commented 9 months ago

我也遇到了这个问题,可以将修改代码将捕捉的异常打印出来,错误提示为 Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead

解决方法是将下面这句代码 mel = whisper.log_mel_spectrogram(audio).to(model.device) 修改为 mel = whisper.log_mel_spectrogram(audio,n_mels = 128).to(model.device)

官方源码中有80和128两个选项,默认是支持80。将n_mels的值修改为128后,也许就可以正常运行了。如果还是不能解决问题,可以试试将n_mels显式的设置为80 。

https://github.com/openai/whisper/pull/1764#issuecomment-1817497095

codewen77 commented 9 months ago

我也遇到了这个问题,可以将修改代码将捕捉的异常打印出来,错误提示为 Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead

解决方法是将下面这句代码 mel = whisper.log_mel_spectrogram(audio).to(model.device) 修改为 mel = whisper.log_mel_spectrogram(audio,n_mels = 128).to(model.device)

官方源码中有80和128两个选项,默认是支持80。将n_mels的值修改为128后,也许就可以正常运行了。如果还是不能解决问题,可以试试将n_mels显式的设置为80 。

openai/whisper#1764 (comment)

感谢 有用!