Plachtaa / VITS-fast-fine-tuning

This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Apache License 2.0
4.69k stars 703 forks source link

step3持续报错 #491

Closed youyi0218 closed 10 months ago

youyi0218 commented 10 months ago

用的模型是直播切片长音频,总共8个最长不超过20分钟的wav,但是step3最后报错 Traceback (most recent call last): File "/content/VITS-fast-fine-tuning/scripts/long_audio_transcribe.py", line 41, in result = model.transcribe(parent_dir + file, word_timestamps=True, *transcribeoptions) File "/usr/local/lib/python3.10/dist-packages/whisper/transcribe.py", line 130, in transcribe , probs = model.detect_language(mel_segment) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/whisper/decoding.py", line 40, in detect_language raise ValueError( ValueError: This model doesn't have language tokens so it can't perform lang id /usr/local/lib/python3.10/dist-packages/whisper/timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details. def backtrace(trace: np.ndarray): Warning: no short audios found, this IS expected if you have only uploaded long audios, videos or video links. this IS NOT expected if you have uploaded a zip file of short audios. Please check your file structure or make sure your audio language is supported.

Plachtaa commented 10 months ago

已修复