erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
686 stars 71 forks source link

FFMPEG RuntimeError: Failed to open the input in finetune.py #243

Closed tom2698 closed 1 month ago

tom2698 commented 1 month ago

🔴 If you have installed AllTalk in a custom Python environment, I will only be able to provide limited assistance/support. AllTalk draws on a variety of scripts and libraries that are not written or managed by myself, and they may fail, error or give strange results in custom built python environments.

🔴 Please generate a diagnostics report and upload the "diagnostics.log" as this helps me understand your configuration.

https://github.com/erew123/alltalk_tts/tree/main?#-how-to-make-a-diagnostics-report-file

Describe the bug After opening finetune.py and clicking create dataset it returns this error. RuntimeError: Failed to open the input ".../alltalk_tts/finetune/tmp-trn/temp/custom_tempfile_1717569516_514.wav" (Invalid data found when processing input). I tried downgrading ffmpeg version to 6.0-16.fc39 from 6.1.1-5.fc9 but same issue. Running on Fedora Linux

To Reproduce Steps to reproduce the behaviour:

  1. Add a clip to the add clips here folder for fine tuning
  2. Open finetune.py
  3. Press create a dataset

Screenshots If applicable, add screenshots to help explain your problem.

Text/logs The data processing was interrupted due an error !! Please check the console to verify the full error message! Error summary: Traceback (most recent call last): File "/models2/VoiceModels/alltalk/alltalk_tts/finetune.py", line 1395, in preprocess_dataset train_meta, eval_meta, audio_total_size = format_audio_list(target_language=language, whisper_model=whisper_model, out_path=out_path, eval_split_number=eval_split_number, speaker_name_input=speaker_name_input, gradio_progress=progress) File "/models2/VoiceModels/alltalk/alltalk_tts/finetune.py", line 385, in format_audio_list wav, sr = torchaudio.load(temp_audio_path, format="wav") File "/models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torchaudio/_backend/utils.py", line 205, in load return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size) File "/models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torchaudio/_backend/ffmpeg.py", line 297, in load return load_audio(uri, frame_offset, num_frames, normalize, channels_first, format) File "/models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torchaudio/_backend/ffmpeg.py", line 88, in load_audio s = torchaudio.io.StreamReader(src, format, None, buffer_size) File "/models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torio/io/_streaming_media_decoder.py", line 526, in init self._be = ffmpeg_ext.StreamingMediaDecoder(os.path.normpath(src), format, option) RuntimeError: Failed to open the input "/models2/VoiceModels/alltalk/alltalk_tts/finetune/tmp-trn/temp/custom_tempfile_1717569516_514.wav" (Invalid data found when processing input). Exception raised from get_input_format_context at /__w/audio/audio/pytorch/audio/src/libtorio/ffmpeg/stream_reader/stream_reader.cpp:42 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f6a040cf897 in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::string const&) + 0x64 (0x7f6a0407fb25 in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torch/lib/libc10.so) frame #2: + 0x42334 (0x7f69ffecb334 in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib/python3.10/site-packages/torio/lib/libtorio_ffmpeg6.so) frame #3: torio::io::StreamingMediaDecoder::StreamingMediaDecoder(std::string const&, std::optional const&, std::optional<std::map<std::string, std::string, std::less, std::allocator<std::pair<std::string const, std::string> > > > const&) + 0x14 (0x7f69ffecdd34 in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib/python3.10/site-packages/torio/lib/libtorio_ffmpeg6.so) frame #4: + 0x3aa4e (0x7f694491aa4e in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torio/lib/_torio_ffmpeg6.so) frame #5: + 0x32617 (0x7f6944912617 in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torio/lib/_torio_ffmpeg6.so) frame #11: + 0xf6cb (0x7f6a061156cb in /models2/VoiceModels/alltalk/alltalk_tts/venv/lib64/python3.10/site-packages/torchaudio/lib/_torchaudio.so) frame #45: + 0x8e897 (0x7f6a52aac897 in /lib64/libc.so.6) frame #46: + 0x11580c (0x7f6a52b3380c in /lib64/libc.so.6)

Desktop (please complete the following information): AllTalk was updated: [approx. date]: Most recent Custom Python environment: [yes/no give details if yes]: Yes. The one that was used in the setup Text-generation-webUI was updated: [approx. date]: Most recent

Additional context Add any other context about the problem here.

tom2698 commented 1 month ago

Nevermind. Am a silly goose and didnt install it properly.

Oninaig commented 1 week ago

@tom2698 what didn't you install properly? im running into the same issue.

tom2698 commented 1 week ago

@tom2698 what didn't you install properly? im running into the same issue.

I forget exactly. I vaguely remember missing a step in the instructions though. So try running through the instructions again and make sure you have the correct versions of everything.