Audio error processing dataset using "Begin Process" button

TomiVolt commented 1 week ago

Hi Jarod! My folders under datasets are model1\tomi\mono.wav. The wav file is mono.

The error that pops up shortly after it begins processing speakers is:

File "C:\AI\beatrice_trainer_webui_v1.0\webui.py", line 70, in run_whisperx_transcribe audio = whisperx.load_audio(audio_file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\AI\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\whisperx\audio.py", line 63, in load_audio raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e RuntimeError: Failed to load audio:

JarodMica commented 1 week ago

Unfortunately, there's not much to go off of here as it seems like there wasn't much output from e. Would you be able to post more of the stacktrace in the terminal? It also looks like you put the repo almost directly in your C drive and I know sometimes windows has odd permission issues, so could you try moving it to your desktop if the below works?

I wanna see if ffmpeg is working properly in your package. Can you try to run a similar command to process in ffmpeg by opening up a cmd terminal in the package folder and running a modified version of the below based on your audio name:

.\ffmpeg.exe -i ".\datasets\model1\tomi\mono.wav" -ar 48000 ".\test.wav"

TomiVolt commented 1 week ago

I moved this all to desktop then tried the ffmepeg command you sent from the new folder in terminal. It didn't produce any error but it also didn't create a new test.wav file if it was supposed to. I Here is that stack trace:

C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0>.\ffmpeg.exe -i ".\datasets\model1\tomi\mono.wav" -ar 48000 ".\test.wav"

C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0>dir .*.wav Volume in drive C has no label. Volume Serial Number is DC33-4AC8

Directory of C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0

File Not Found

Also here is the full stack trace from running on the desktop for the "Begin Process" button:

C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0>runtime\python.exe webui.py C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\pyannote\audio\core\io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). [<gradio.components.dropdown.Dropdown object at 0x0000028CB6E98C10>] [<gradio.components.dropdown.Dropdown object at 0x0000028CB6E98C10>] [<gradio.components.dropdown.Dropdown object at 0x0000028CB5AC2D10>] No language specified, language will be first be detected for each audio file (increases inference time). Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint C:\Users\Raoc\.cache\torch\whisperx-vad-segmentation.bin Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.3.1+cu121. Bad things might happen unless you revert torch to 1.x. Loaded Whisper model Traceback (most recent call last): File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\whisperx\audio.py", line 61, in load_audio out = subprocess.run(cmd, capture_output=True, check=True).stdout ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "subprocess.py", line 571, in run subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', 'training\model1\tomi\mono.wav', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 3221225501.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\queueing.py", line 536, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\route_utils.py", line 321, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\blocks.py", line 1935, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\blocks.py", line 1520, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, *kwargs) ^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\gradio\utils.py", line 826, in wrapper response = f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\webui.py", line 162, in process_proxy transcription_result = run_whisperx_transcribe(copied_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\webui.py", line 70, in run_whisperx_transcribe audio = whisperx.load_audio(audio_file_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0\runtime\Lib\site-packages\whisperx\audio.py", line 63, in load_audio raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e RuntimeError: Failed to load audio:

***Attaching the wav file specs in case, I've tried a few but I tend to save all as mono 44.1 16k as that's the default in audacity:

wav file details

JarodMica commented 1 week ago

I moved this all to desktop then tried the ffmepeg command you sent from the new folder in terminal. It didn't produce any error but it also didn't create a new test.wav file if it was supposed to. I Here is that stack trace:

C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0>.\ffmpeg.exe -i ".\datasets\model1\tomi\mono.wav" -ar 48000 ".\test.wav"

C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0>dir .*.wav Volume in drive C has no label. Volume Serial Number is DC33-4AC8

Directory of C:\Users\Raoc\Desktop\beatrice_trainer_webui_v1.0

File Not Found

So it should actually create a test.wav file in the beatrice directory... my hunch has to be ffmpeg related

Let's try that command directly as well, can you run:

ffmpeg.exe -nostdin -threads 0 -i "training\model1\tomi\mono.wav" -f s16le -ac 1 -acodec pcm_s16le -ar 16000 "test.wav"

Also, by chance do you have ffmpeg installed locally on your device?

TomiVolt commented 1 week ago

I may have but I can't tell now since my windows 10 install bricked after a a windows update. I think we can close this issue. I reinstalled windows 10 and got a different issue. I'll create a new one with the new details, hopefully easier to solve - thanks!

JarodMica / beatrice_trainer_webui

Audio error processing dataset using "Begin Process" button #1